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ARRAYS OF NUCLEIC ACID PROBES ON BIOLOGICAL CHIPS 

5 . Cross-Reference to Related Application 

This application is a continuation-in-part of USSN 
08/284,064, filed August 2, 1994, which is a continuation-in- 
part of USSN 08/143,312, filed October 26, 1993, each of which 
is incorporated by reference in its entirety for all purposes. 
10 Research leading to the invention was funded in part by NIH 

grant No. 1R01HG00813-01, and the government may have certain 
rights to the invention. 

Background of the Invention 
15 Field of the Invention 

The present invention provides arrays of oligonucleotide 
probes immobilized in microf abricated patterns on silica chips 
for analyzing molecular interactions of biological interest. 
The invention therefore relates to diverse fields impacted by 
20 the nature of molecular interaction, including chemistry, 
biology, medicine, and medical diagnostics. 

Descriptio n of Related Art 

Oligonucleotide probes have long been used to detect 

25 complementary nucleic acid sequences in a nucleic acid of 

interest (the "target" nucleic acid) . In some assay formats, 
the oligonucleotide probe is tethered, i.e., by covalent 
attachment, to a solid support, and arrays of oligonucleotide 
probes immobilized on solid supports have been used to detect 

30 specific nucleic acid sequences in a target nucleic acid. 
See, e.g., PCT patent publication Nos. WO 89/10977 and 
89/1154 8. Others have proposed the use of large numbers of 
oligonucleotide probes to provide the complete nucleic acid 
sequence of a target nucleic acid but failed to provide an 

35 enabling method for using arrays of immobilized probes for 
this purpose. See U.S. Patent Nos. 5,202,231 and 5,002,867 
and PCT patent publication No. WO 93/17126. 
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The development of VLSIPS™ technology has provided 
methods for making very large arrays of oligonucleotide probes 
in very small arrays- See U.S. Patent No. 5,143,854 and PCT 
patent publication Nos. WO 90/15070 and 92/10092, each of 
5 which is incorporated herein by reference. U.S. Patent 

application Serial No. 082,937, filed June 25, 1993, describes 
methods for making arrays of oligonucleotide probes that can 
be used to provide the complete sequence of a target nucleic 
acid and to detect the presence of a nucleic acid containing a 

10 specific nucleotide sequence. 

Microfabricated arrays of large numbers of 
oligonucleotide probes, called "DNA chips" offer great promise 
for a wide variety of applications. New methods and reagents 
are required to realize this promise, and the present 

15 invention helps meet that need. 

SUMMARY OF THE INVENTION 
The invention provides several strategies employing 
immobilized arrays of probes for comparing a reference 
sequence of known sequence with a target sequence showing 

20 substantial similarity with the reference sequence, but 

differing in the presence of, e.g., mutations. In a first 
embodiment, the invention provides a tiling strategy employing 
an array of immobilized oligonucleotide probes comprising at 
least two sets of probes. A first probe set comprises a 

25 plurality of probes, each probe comprising a segment of at 

least three nucleotides exactly complementary to a subsequence 
of the reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence. A second probe set 

30 comprises a corresponding probe for each probe in the first 
probe set, the corresponding probe in the second probe set 
being identical to a sequence comprising the corresponding 
probe from the first probe set or a subsequence of at least 
three nucleotides thereof that includes the at least one 

35 interrogation position, except that the at least one * 
interrogation position is occupied by a different nucleotide 
in each of the two corresponding probes from the first and 
second probe sets. The probes in the first probe set have at 
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least two interrogation positions corresponding to two 
contiguous nucleotides in the reference sequence. One 
interrogation position corresponds to one of the contiguous 
nucleotides, and the other interrogation position to the 
5 other . 

In a second embodiment , the invention provides a tiling 
strategy employing an array comprising four probe sets. A 
first probe set comprises a plurality of probes, each probe 
comprising a segment of at least three nucleotides exactly 

10 complementary to a subsequence of the reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence. Second, third and fourth probe sets each comprise a 
corresponding probe for each probe in the first probe set. 

15 The probes in the second, third and fourth probe sets are 
identical to a sequence comprising the corresponding probe 
from the first probe set or a subsequence of at least three 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 

20 interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets. The first probe set often has at least 100 
interrogation positions corresponding to 100 contiguous 
nucleotides in the reference sequence. Sometimes the first 

25 probe set has an interrogation position corresponding to every 
nucleotide in the reference sequence. The segment of 
complementarity within the probe set is usually about 9-21 
nucleotides. Although probes may contain leading or trailing 
sequences in addition to the 9-21 sequences, many probes 

30 consist exclusively of a 9-21 segment of complementarity. 

In a third embodiment, the invention provides immobilized 
arrays of probes tiled for multiple reference sequences. One 
such array comprises at least one pair of first and second 
probe groups, each group comprising first and second sets of 

35 probes as defined in the first embodiment. Each probe in the 
first probe set from the first group is exactly complementary 
to a subsequence of a first reference sequence, and each probe 
in the first probe set from the second group is exactly 
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complementary to a subsequence of a second reference sequence. 
Thus, the first group of probes are tiled with respect to a 
first reference sequence and the second group of probes with 
respect to a second reference sequence. Each group of probes 
5 can also include third and fourth sets of probes as defined in 
the second embodiment. In some arrays of this type, the 
second reference sequence is a mutated form of the first 
reference sequence. 

In a fourth embodiment, the invention provides arrays for 

10 block tiling. Block tiling is a species of the general tiling 
strategies described above. The usual unit of a block tiling 
array is a group of probes comprising a wildtype probe, a 
first set of three mutant probes and a second set of three 
mutant probes. The wildtype probe comprises a segment of at 

15 least three nucleotides exactly complementary to a subsequence 
of a reference sequence. The segment has at least first and 
second interrogation positions corresponding to first and 
second nucleotides in the reference sequence. The probes in 
the first set of three mutant probes are each identical to a 

2 0 sequence comprising the wildtype probe or a subsequence of at 

least three nucleotides thereof including the first and second 
interrogation positions, except in the first interrogation 
position, which is occupied by a different nucleotide in each 
of the three mutant probes and the wildtype probe. The probes 

25 in the second set of three mutant probes are each identical to 
a sequence comprising the wildtype probes or a subsequence of 
at least three nucleotides thereof including the first and 
second interrogation positions, except in the second 
interrogation position, which is occupied by a different 

30 nucleotide in each of the three mutant probes and the wildtype 
probe . 

In a fifth embodiment, the invention provides methods of 
comparing a target sequence with a reference sequence using 
arrays of immobilized pooled probes. The arrays employed in 

3 5 these methods represent a further species of the general 

tiling arrays noted above. In these methods, variants of a 
reference sequence differing from the reference sequence in at 
least one nucleotide are identified and each is assigned a 
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designation. An array of pooled probes is provided, with each 
pool occupying a separate cell of the array. Each pool 
comprises a probe comprising a segment exactly complementary 
to each variant sequence assigned a particular designation. 
5 The array is then contacted with a target sequence comprising 
a variant of the reference sequence. The relative 
hybridization intensities of the pools in the array to the 
target sequence are determined. The identity of the target 
sequence is deduced from the pattern of hybridization 

10 intensities. Often, each variant is assigned a designation 
having at least one digit and at least one value for the 
digit. In this case, each pool comprises a probe comprising a 
segment exactly complementary to each variant sequence 
assigned a particular value in a particular digit. When 

15 variants are assigned successive numbers in a numbering system 
of base m having n digits, n x (m-l> pooled probes are used 
are used to assign each variant a designation. 

In a sixth embodiment, the invention provides a pooled 
probe for trellis tiling, a further species of the general 

20 tiling strategy. In trellis tiling, the identity of a 
nucleotide in a target sequence is determined from a 
comparison of hybridization intensities of three pooled 
trellis probes. A pooled trellis probe comprises a segment 
exactly complementary to a subsequence of a reference sequence 

25 except at a first interrogation position occupied by a pooled 
nucleotide N, a second interrogation position occupied by a 
pooled nucleotide selected from the group of three consisting 
of (1) M or K, (2) R or Y and (3) S or W, and a third 
interrogation position occupied by a second pooled nucleotide 

3 0 selected from the group. The pooled nucleotide occupying the 
second interrogation position comprises a nucleotide 
complementary to a corresponding nucleotide from the reference 
sequence when the second pooled probe and reference sequence 
are maximally aligned, and the pooled nucleotide occupying the 

35 third interrogation position comprises a nucleotide 

complementary to a corresponding nucleotide from the reference 
sequence when the third pooled probe and the reference 
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sequence are maximally aligned. Standard IUPAC nomenclature 
is used for describing pooled nucleotides. 

In trellis tiling, an array comprises at least first, 
second and third cells, respectively occupied by first, second 
5 and third pooled probes, each according to the generic 

description above. However, the segment of complementarity, 
location of interrogation positions, and selection of pooled 
nucleotide at each interrogation position may or may not 
differ between the three pooled probes subject to the 

10 following constraint. One of the three interrogation 

positions in each of the three pooled probes must align with 
the same corresponding nucleotide in the reference sequence. 
This interrogation position must be occupied by a N in one of 
the pooled probes, and a different pooled nucleotide in each 

15 of the other two pooled probes. 

In a seventh embodiment, the invention provides arrays 
for bridge tiling. Bridge tiling is a species of the general 
tiling strategies noted above, in which probes from the first 
probe set contain more than one segment of complementarity. 

20 In bridge tiling, a nucleotide in a reference sequence is 

usually determined from a comparison of four probes. A first 
probe comprises at least first and second segments, each of at 
least three nucleotides and each exactly complementary to 
first and second subsequences of a reference sequences. The 

25 segments including at least one interrogation position 

corresponding to a nucleotide in the reference sequence. 
Either (1) the first and second subsequences are noncontiguous 
in the reference sequence, or (2) the first and second 
subsequences are contiguous and the first and second segments 

30 are inverted relative to the first and second subsequences. 

The arrays further comprises second, third and fourth probes, 
which are identical to a sequence comprising the first probe 
or a subsequence thereof comprising at least three nucleotides 
from each. of the first and second segments, except in the at 

35 least one interrogation position, which differs in each of the * 
probes. In a species of bridge tiling, referred to as 
deletion tiling, the first and second subsequences are 
separated by one or two nucleotides in the reference sequence. 
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In an eighth embodiment, the invention provides arrays of 
probes for multiplex tiling. Multiplex tiling is a strategy, 
in which the identity of two nucleotides in a target sequence 
is determined from a comparison of the hybridization 
5 intensities of four probes, each having two interrogation 
positions. Each of the probes comprising a segment of at 
least 7 nucleotides that is exactly complementary to a 
subsequence from a reference sequence, except that the segment 
may or may not be exactly complementary at two interrogation 

10 positions. The nucleotides occupying the interrogation 

positions are selected by the following rules: (1) the first 
interrogation position is occupied by a different nucleotide 
in each of the four probes, (2) the second interrogation 
position is occupied by a different nucleotide in each of the 

15 four probes, (3) in first and second probes, the segment is 
exactly complementary to the subsequence, except at no more 
than one of the interrogation positions, (4) in third and 
fourth probes, the segment is exactly complementary to the 
subsequence, except at both of the interrogation positions. 

20 In a ninth embodiment, the invention provides arrays of 

immobilized probes including helper mutations. Helper 
mutations are useful for, e.g., preventing self-annealing of 
probes having inverted repeats. In this strategy, the 
identity of a nucleotide in a target sequence is usually 

25 determined from a comparison of four probes. A first probe 
comprises a segment of at least 7 nucleotides exactly 
complementary to a subsequence of a reference sequence except 
at one or two positions, the segment including an 
interrogation position not at the one or two positions. The 

30 one or two positions are occupied by helper mutations. 

Second, third and fourth mutant probes are each identical to a 
sequence comprising the wildtype probe or a subsequence 
thereof including the interrogation position and the one or 
two positions, except in the interrogation position, which is 

35 occupied by a different nucleotide in each of the four probes. 

In a tenth embodiment, the invention provides arrays of 
probes comprising at least two probe sets, but lacking a probe 
set comprising probes that are perfectly matched to a 
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reference sequence. Such arrays are usually employed in 
methods in which both reference and target sequence are 
hybridized to the array. The first probe set comprising a 
plurality of probes, each probe comprising a segment exactly 
5 complementary to a subsequence of at least 3 nucleotides of a 
reference sequence except at an interrogation position. The 
second probe set comprises a corresponding probe for each 
probe in the first probe set, the corresponding probe in the 
second probe set being identical to a sequence comprising the 

10 corresponding probe from the first probe set or a subsequence 
of at least three nucleotides thereof that includes the 
interrogation position, except that the interrogation position 
is occupied by a different nucleotide in each of the two 
corresponding probes and the complement to the reference 

15 sequence. 

In an eleventh embodiment, the. invention provides methods 
of comparing a target sequence with a reference sequence 
comprising a predetermined sequence of nucleotides using any 
of the arrays described above. The methods comprise 

20 hybridizing the target nucleic acid to an array and 

determining which probes, relative to one another, in the 
array bind specifically to the target . nucleic acid. The 
relative specific binding of the probes indicates whether the 
target sequence is the same or different from the reference 

25 sequence. In some such methods, the target sequence has a 

substituted nucleotide relative to the reference sequence in 
at least one undetermined position, and the relative' specific 
binding of the probes indicates the location of the position 
and the nucleotide occupying the position in the target 

3 0 sequence. In some methods, a second target nucleic acid is 
also hybridized to the array.. The relative specific binding 
of the probes then indicates both whether the target sequence 
is the same or different from the reference sequence, and 
whether the second target sequence is the same or different 

3 5 from the reference sequence. In some methods, when the array 
comprises two groups of probes tiled for first and second 
reference sequences, respectively, the relative specific 
binding of probes in the first group indicates whether the 
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target sequence is the same or different from the first 
reference sequence. The relative specific binding of probes 
in the second group indicates whether the target sequence is 
the same or different from the second reference sequence. 
5 Such methods are particularly useful for analyzing 

heterologous alleles of a gene. Some methods entail 
hybridizing both a reference sequence and a target sequence to 
any of the arrays of probes described above. Comparison of 
the relative specific binding of the probes to the reference 

10 and target sequences indicates whether the target sequence is 
the same or different from the reference sequence. 

In a twelfth embodiment, the invention provides arrays of 
immobilized probes in which the probes are designed to tile a 
reference sequence from a human immunodeficiency virus. 

15 Reference sequences from either the reverse transcriptase gene 
or protease gene of HIV are of particular interest. Some 
chips further comprise arrays of probes tiling a reference 
sequence from a 16S RNA or DNA encoding the 16S RNA from a 
pathogenic microorganism. The invention further provides 

20 methods of using such arrays in analyzing a HIV target 

sequence. The methods are particularly useful where the 
target sequence has a substituted nucleotide relative to the 
reference sequence in at least one position, the substitution 
conferring resistance to a drug use in treating a patient 

25 infected with a HIV virus. The methods reveal the existence 
of the substituted nucleotide. The methods are also 
particularly useful for analyzing a mixture of undetermined 
proportions of first and second target sequences from 
different HIV variants. The relative specific binding of 

30 pirobes indicates the proportions of the first and second 
target sequences. 

In a thirteenth embodiment, the invention provides arrays 
of probes tiled based on reference sequence from a CFTR gene. 
A preferred array comprises at least a group of probes 

35 comprising a wildtype probe, and five sets of three mutant 
probes. The wildtype probe is exactly complementary to a 
subsequence of a reference sequence from a cystic fibrosis 
gene, the segment having at least five interrogation positions 
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corresponding to five contiguous nucleotides in the reference 
sequence. The probes in the first set of three mutant probes 
are each identical to the wildtype probe, except in a first of 
the five interrogation positions, which is occupied by a 
different nucleotide in each of the three mutant probes and 
the wildtype probe. The probes in the second set of three 
mutant probes are each identical to the wildtype probe, except 
in a second of the five interrogation positions, which is 
occupied by a different nucleotide in each of the three mutant 
probes and the wildtype probe. The probes in the third set of 
three mutant probes are each identical to the wildtype probe, 
except in a. third of the five interrogation positions, which 
is occupied by a different nucleotide in each of the three 
mutant probes and the wildtype probe. The probes in the 
fourth set of three mutant probes are each identical to the 
wildtype probe, except in a fourth of the five interrogation 
positions, which is occupied by a different nucleotide in each 
of the three mutant probes and the wildtype probe. The probes 
in the fifth set of three mutant probes are each identical to 
the wildtype probe, except in a fifth of the five 
interrogation positions, which is occupied by a different 
nucleotide in each of the three mutant probes and the wildtype 
probe. Preferably, a chip comprises two such groups of 
probes. The first group comprises a wildtype probe exactly 
complementary to a first reference sequence, and the second 
group comprises a wildtype probe exactly complementary to a 
second reference sequence that is a mutated form of the first 
reference sequence. 

The invention further provides methods of using the 
arrays of the invention for analyzing target sequences from a 
CFTR gene. The methods are capable of simultaneously 
analyzing first and second target sequences representing 
heterozygous alleles of a CFTR gene. 

In a fourteenth embodiment, the invention provides arrays 
of probes tiling a reference sequence from a p53 gene, an 
hMLHl gene and/or an MSH2 gene. The invention further 
provides methods of using the arrays described above to 
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analyze these genes. The method are useful, e.g., for 
diagnosing patients susceptible to developing cancer. 

In a fifteenth embodiment, the invention provides arrays 
of probes tiling a reference sequence from a mitochondrial 
genome. The reference sequence may comprise part or all of 
the D-loop region, or all, or substantially all, of the 
mitochondrial genome. The invention further provides method 
of using the arrays described above to analyze target 
sequences from a mitochondrial genome. The methods are useful 
for identifying mutations associated with disease, and for 
forensic, epidemiological and evolutionary studies. 

BRIEF DESCRIPTION OF THE FIGURES 

Fig. 1: Basic tiling strategy. The figure illustrates 
the relationship between an interrogation position (I) and a 
corresponding nucleotide (n) in the reference sequence, and 
between a probe from the first probe set and corresponding 
probes from second, third and fourth probe sets. 

Fig, 2: Segment of complementarity in a probe from the 
first probe set. 

Fig. 3: Incremental succession of probes in a basic 
tiling strategy. The figure shows four probe sets, each 
having three probes. Note that each probe differs from its 
predecessor in the same set by the acquisition of a 5' 
nucleotide and the loss of a 3 1 nucleotide, as well as in the 
nucleotide occupying the interrogation position. 

Fig. 4: Exemplary arrangement of lanes on a chip. The 
chip shows four probe sets, each having five probes and each 
having a total of five interrogation positions (11-15) , one 
per probe. 

Fig. 5: Hybridization pattern of chip having probes laid 
down in lanes. Dark patches .indicate hybridization. The 
probes in the lower part of the figure occur at the column of 
the array indicated by the arrow when the probes length is 15 
and the interrogation position 7. 

Fig. 6: Strategies for detecting deletion and insertion 
mutations. Bases in brackets may or may not be present. 
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Fig. 7: Block tiling strategy. The probe from the first 
probe set has three interrogation positions. The probes from 
the other probe sets have only one of these interrogation 
positions. 

5 Fig. 8: Multiplex tiling strategy. Each probe has two 

interrogation positions. 

Fig. 9. Helper mutation strategy. The segment of 
complementarity differs from the complement of the reference 
sequence at a helper mutation as well as the interrogation 
10 position. 

Fig. 10 Layout of probes on the HV 407 chip. The figure 
shows successive rows of sequence each of which is subdivided 
into four lanes. The four lanes correspond to the A-, C-, G- 
and T-lanes on the chip. Each probe is represented by the 
15 nucleotide occupying its interrogation position. The letter 

H N" indicates a control probe or empty column. The different 
sized-probes are laid out in parallel. That is, from top-to- 
bottom, a row of 13 mers is followed by a row of 15 mers, 
which is followed by a row of 17 mers, which is followed by a 
20 row of 19 mers. 

Fig. 11 Fluorescence pattern of HV 407 hybridized to a 
target sequence (pPoll9) identical to the chips reference 
sequence. 

Fig. 12 Sequence read from HV 407 chip hybridized to 
25 pPoll9 and 4MUT18 (separate experiments) . The reference 
sequence is designated "wildtype." Beneath the reference 
sequence are four rows of sequence read from the chip 
hybridized to the pPoll9 target, the first row being read from 
13 mers, the second row from 15 mers,. the third row from 17 
30 mers and the fourth row from 19 mers. Beneath these 

sequences, there are four further rows of sequence read from 
the chip hybridized to the HXB2 target. Successive rows are 
read from 13 mers, 15 mers, 17 mers and 19 mers. Each 
nucleotide in a row is called from the relative fluorescence 
35 intensities of probes in A-, C-, G- and T-lanes. Regions of 
ambiguous sequence read from the chip are highlighted. The 
strain differences between the HBX2 sequence and the reference 
sequence that were correctly detected are indicated (*) , and 
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those that could not be called are indicated (o) . (The 
nucleotide at position 417 was read correctly in some 
experiments) . The location of some mutations known to be 
associated with drug resistance that occur in readable regions 
5 of the chip are shown above (codon number) and below (mutant 
nucleotide) the sequence designated "wildtype." The locations 
of primer used to amplify the target sequence are indicated by 
arrows . 

Fig. 13: Detection of mixed target sequences. The 
10 mutant target differs from the wildtype by a single mutation 

in codon 67 of the reverse transcriptase gene. Each different 
sized group of probes has a column of four probes for reading 
the nucleotide in which the mutation occurs. The four probes 
occupying a column are represented by a single probe in the 
15 figure with the symbol (o) indicating the interrogation 

position, which is occupied by a different nucleotide in each 
probe. 

Fig. 14: Fluorescence intensities of target bound to 13 
mers and 15 mers for different proportions of mutant and 

20 wildtype target. The fluorescence intensities are from probes 
having interrogation positions for reading the nucleotide at 
which the mutant and wildtype targets diverge. 

Fig. 15: Sequence read from protease chip from four 
clinical samples before and after treatment with ddl>. 

25 Fig. 16: Block tiling array of probes for analyzing a 

CFTR point mutation. Each probe show actually represents four 
probes, with one probe having each of A, C, G or T at the 
interrogation position N. In the order shown, the first probe 
shown on the left is tiled from the wildtype reference 

3 0 sequence, the second probe from the mutant sequence, and so on 
in alternating fashion. Note that all of the probes are 
identical except at the interrogation position, which shifts 
one position between successive probes tiled from the same 
reference sequence (e.g., the first, third and fifth probes in 

35 the left hand column.) The grid shows the hybridization 
intensities when the array is hybridized to the reference 
sequence. 



WO 95/11995 



14 



PCTAJS94/12305 



Fig. 17: Hybridization pattern for heterozygous target. 
The figure shows the hybridization pattern when the array of 
the previous figure is hybridized to a mixture of mutant and 
wildtype reference sequences. 

Fig. 18, in panels A, B, and C, shows an image made from 
the region of a DNA chip containing CFTR exon 10 probes; in 
panel A, the chip was hybridized to a wild-type target; in 
panel C, the chip was hybridized to a mutant AF508 target; and 
in panel B, the chip was hybridized to a mixture of the 
wild-type and mutant targets. 

Fig, 19, in sheets 1-3, corresponding to panels A, B r 
and C of Fig. 18, shows graphs of fluorescence intensity 
versus tiling position. The labels on the horizontal axis 
show the bases in the wild-type sequence corresponding to the 
position of substitution in the respective probes. Plotted 
are the intensities observed from the features (or synthesis 
sites) containing wild-type probes, the features containing 
the substitution probes that bound the most target ("called") , 
and the feature containing the substitution probes that bound 
the target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 

Fig. 20, in panels A, B, and C, shows an image made from 
a region of a DNA chip containing CFTR exon 10 probes; in 
panel A, the chip was hybridized to the wt4 8 0 target; in panel 
C, the chip was hybridized to the mu480 target; and in panel 
B, the chip was hybridized to a mixture of the wild-type and 
mutant targets. 

Fig. 21, in sheets 1-3, corresponding to panels A, B, 
and C of Fig, 20, shows graphs of fluorescence intensity 
versus tiling position. The labels on the horizontal axis 
show the bases in the wild-type sequence corresponding to the 
position of substitution in the respective probes. Plotted 
are the intensities observed from the features (or synthesis 
sites) containing wild-type probes, the features containing 
the substitution probes that bound the most target ("called") , 
and the feature containing the substitution probes that bound 
the target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 
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Fig. 22, in panels A and B, shows an image made from a 
region of a DNA chip containing CFTR exon 10 probes; in panel 
A, the chip was hybridized to nucleic acid derived from the 
genomic DNA of an individual with wild-type AF508 sequences; 
5 in panel B, the target nucleic acid originated from a 

heterozygous (with respect to the AF508 mutation) individual. 

Fig. 23, in sheets 1 and 2, corresponding to panels A and 
B of Fig. 22, shows graphs of fluorescence intensity versus 
tiling position. The labels on the horizontal axis show the 
10 bases in the wild-type sequence corresponding to the position 
of substitution in the respective probes. Plotted are the 
intensities observed from the features (or synthesis sites) 
containing wild-type probes, the features containing the 
substitution probes that bound the most target ("called") , and 
15 the feature containing the substitution probes that bound the 
target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 

Fig. 24: Hybridization of homozygous wildtype (A) and 
heterozygous (B) target sequences from exon 11 of the CFTR 
20 gene to a block tiling array designed to detect G551D and 
Q552X mutations in CFTR gene. 

Fig. 25: Hybridization of homozygous wildtype (A) and 
AF508 mutant (B) target sequences from exon 10 of the CFTR 
gene to a block tiling array designed to detect mutations, 
25 AF508, AI507 and F508C. 

Fig. 26: Hybridization of heterozygous mutant target 
sequences, AF508/F508C, to the array of Fig. 25. 

Fig. 27 shows the alignment of some of the probes on a 
p53 DNA chip with a 12-mer model target nucleic acid. 
30 Fig. 28 shows a set of 10-mer probes for a p53 exon 6 DNA 

chip. 

Fig. 29 shows that very distinct patterns are observed 
after hybridization of p53 DNA chips with targets having 
different 1 base substitutions. In the first image in Fig. 
35 29, the 12-mer probes that form perfect matches with the 
wild-type target are in the first row (top) . The 12-mer 
probes with single base mismatches are located in the second, 
third, and fourth rows and have much lower signals. 
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Fig. 30, in graphs 2,3, and 4, graphically depicts the 
data in Fig. 29. On each graph, the X ordinate is the 
position of the probe in its row on the chip, and the Y 
ordinate is the signal at that probe site after hybridization. 
5 Fig. 31 shows the results of hybridizing mixed target 

populations of WT and mutant p53 genes to the p53 DNA chip. 

Fig. 32, in graphs 1-4, shows (see Fig. 30 as well) the 
hybridization efficiency of a 10-mer probe array as compared 
to a 12-mer probe array. 
10 Fig, 33 shows an image of a p53 DNA chip hybridized to a 

target DNA. 

Fig. 34 illustrates how the actual sequence was read from 
the chip shown in Fig. 33. Gaps in the sequence of letters in 
the WT rows correspond to control probes or sites. Positions 
15 at which bases are miscalled are represented by letters in 

italic type in cells corresponding to probes in which the WT 
bases have been substituted by other bases. 

Fig. 35 shows the human mitochondrial genome; n O H " is the 
H strand origin of replication, and arrows indicate the cloned 

2 0 unshaded sequence. 

Fig. 36 shows the image observed from application of a 
sample of mitochondrial DNA derived nucleic acid (from the mt4 
sample) on a DNA chip. 

Fig. 37 is similar to Fig. 3 6 but shows the image 
25 observed from the mt5 sample. 

Fig. 38 shows the predicted difference image between the 
mt4 and mt5 samples on the DNA chip based on mismatches 
between the two samples and the reference sequence. 

Fig. 3 9 shows the actual difference image observed for 
30 the mt4 and mt5 samples. 

Fig. 40, in sheets 1 and 2, shows a plot of normalized 
intensities across rows 10 and 11 of the array and a 
tabulation of the mutations detected. 

Fig. 41 shows the discrimination between wild-type and 

3 5 mutant hybrids obtained with the chip. A median of the six 

normalized hybridization scores for each probe was taken; the 
graph plots the ratio of the median score to the normalized 
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hybridization score versus mean counts. A ratio of 1.6 and 
mean counts above 50 yield no false positives. 

Fig. 42 illustrates how the identity of the base mismatch 
may influence the ability to discriminate mutant and wild-type 
5 sequences more than the position of the mismatch within an 

oligonucleotide probe. The mismatch position is expressed as 
% of probe length from the 3' -end. The base change is 
indicated on the graph. 

Fig. 43 provides a 5 1 to 3 1 sequence listing of one 
10 target corresponding to the probes on the chip. X is a 

control probe. Positions that differ in the target (i.e., are 
mismatched with the probe at the designated site) are in bold. 

Fig. 44 shows the fluorescence image produced by scanning 
the chip described in Fig. 17 when hybridized to a sample. 
15 Fig. 45 illustrates the detection of 4 transitions in the 

target sequence relative to the wild-type probes on the chip 
in Fig. 44. 

Fig. 46: VLSIPS™ technology applied to the light 
directed synthesis of oligonucleotides. Light (hv) is shone 

20 through a mask (M^ to activate functional groups (-0H) on a 
surface by removal of a protecting group (X) . Nucleoside 
building blocks protected with photoremovable protecting 
groups (T-X, OX) are coupled to the activated areas. By 
repeating the irradiation and coupling steps, very complex 

25 arrays of oligonucleotides can be prepared. 

Fig. 47: Use of the VLSIPS™ process to prepare 
"nucleoside combinatorials" or oligonucleotides synthesized by 
coupling all four nucleosides to form dimers, trimers, and so 
forth. 

30 Fig. 48: Deprotection, coupling, and oxidation steps of 

a solid phase DNA synthesis method. 

Fig. 49: An illustrative synthesis route for the 
nucleoside building blocks used in the VLSIPS™ method. 

Fig. 50: A preferred photoremovable protecting group, 
35 MeNPOC, and preparation of the group in active form. 

Fig. 51: Detection system for scanning a DNA chip. 
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DETAILED DESCRIPTION OF THE INVENTION 
The invention provides a number of strategies for 
comparing a polynucleotide of known sequence (a reference 
sequence) with variants of that sequence (target sequences) . 
The comparison can be performed at. the level of entire 
genomes, chromosomes, genes, exons or introns, or can focus on 
individual mutant sites and immediately adjacent bases. The 
strategies allow detection of variations, such as mutations or 
polymorphisms, in the target sequence irrespective whether a 
particular variant has previously been characterized. The 
strategies both define the nature of a variant and identify 
its location in a target sequence. 

The strategies employ arrays of oligonucleotide probes 
immobilized to a solid support. Target sequences are analyzed 
by determining the extent of hybridization at particular 
probes in the array. The strategy in selection of probes 
facilitates distinction between perfectly matched probes and 
probes showing single-base or other degrees of mismatches. 
The strategy usually entails sampling each nucleotide of 
interest in a target sequence several times, thereby achieving 
a high degree of confidence in its identity. This level of 
confidence is further increased by sampling of adjacent 
nucleotides in the target sequence to nucleotides of interest. 
The number of probes on the chip can be quite large (e.g., 
10 5 -10 6 ) . However, usually only a small proportion of the 
total number of probes of a given length are represented. 
Some advantage of the use of only a small proportion of all 
possible probes of a given length include: (i) each position 
in the array is highly informative, whether or not 
hybridization occurs; (ii) nonspecific hybridization is 
minimized; (iii) it is straightforward to correlate 
hybridization differences with sequence differences, 
particularly with reference to the hybridization pattern of a 
known standard; and (iv) the ability to address each probe 
independently during synthesis, using high resolution 
photolithography, allows the array to be designed and 
optimized for any sequence. For example the length of any 
probe can be varied independently of the others. 
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The present tiling strategies result in sequencing and 
comparison methods suitable for routine large-scale practice 
with a high degree of confidence in the sequence output. 

5 . I . GENERAL TILING STRATEGIES 

A. Selection of Reference Sequence 

The chips are designed to contain probes exhibiting 
complementarity to one or more selected reference sequence 
whose sequence is known. The chips are used to read a target 

10 sequence comprising either the reference sequence itself or 
variants of that sequence. Target sequences may differ from 
the reference sequence at one or more positions but show a 
high overall degree of sequence identity with the reference 
sequence (e.gr., at least 75, 90, 95, 99, 99.9 or 99.99%). Any 

15 polynucleotide of known sequence can be selected as a 

reference sequence. Reference sequences of interest include 
sequences known to include mutations or polymorphisms 
associated with phenotypic changes having clinical 
significance in human patients. For example, the CFTR gene 

20 and P53 gene in humans have been identified as the location of 
several mutations resulting in cystic fibrosis or cancer 
respectively. Other reference sequences of interest include 
those that serve to identify pathogenic microorganisms and/ or 
are the site of mutations by which such microorganisms acquire 

25 drug resistance (e.g., the HIV reverse transcriptase gene). 
Other reference sequences of interest include regions where 
polymorphic variations are known to occur (e.gr., the D-loop 
region of mitochondrial DNA) . These reference sequences have 
utility for, e.g., forensic or epidemiological studies. Other 

30 reference sequences of interest include p34 (related to p53) , 
p65 (implicated in breast, prostate and liver cancer) , and DNA 
segments encoding cytochromes P450 (see Meyer et al., Pharmac. 
Ther. 46, 349-355 (1990)). Other reference sequences of 
interest include those from the genome of pathogenic viruses 

35 (e.g., hepatitis (A, B, or C) , herpes virus (e.g., VZV, HSV-1, 
HAV-6, HSV-II, and CMV, Epstein Barr virus), adenovirus, 
influenza virus, f laviviruses, echovirus, rhinovirus, 
coxsackie virus, cornovirus, respiratory syncytial virus, 
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mumps virus, rotavirus, measles virus, rubella virus, 
parvovirus, vaccinia virus, HTLV virus, dengue virus, 
papillomavirus, molluscum virus, poliovirus, rabies virus, JC 
virus and arboviral encephalitis virus. Other reference 
sequences of interest are from genomes or episomes of 
pathogenic bacteria, particularly regions that confer drug 
resistance or allow phylogenic characterization of the host 
(e.g., 16S rRNA or corresponding DNA) . For example, such 
bacteria include chlamydia, rickettsial bacteria, 
mycobacteria, staphylococci, treptocci, pneumonococci, 
meningococci and conococci, klebsiella, proteus, serratia, 
pseudomonas, legionella, diphtheria, salmonella, bacilli, 
cholera, tetanus, botulism, anthrax, plague, leptospirosis , 
and Lymes disease bacteria. Other reference sequences of 
interest include those in which mutations result in the 
following autosomal recessive disorders: sickle cell anemia, 
/3-thalassemia, phenylketonuria, galactosemia, Wilson's 
disease, hemochromatosis, severe combined immunodeficiency, 
alpha-l-antitrypsin deficiency, albinism, alkaptonuria, 
lysosomal storage diseases and Ehlers-Danlos syndrome. Other 
reference sequences of interest include those in which 
mutations result in X-linked recessive disorders: hemophilia, 
glucose-6-phosphate dehydrogenase , agammaglobulimenia , 
diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, 
Wiskott-Aldrich syndrome, Fabry 1 s disease and fragile X- 
syndrome. Other reference sequences of interest includes 
those in which mutations result in the following autosomal 
dominant disorders: familial hypercholesterolemia, polycystic 
kidney disease, Huntingdon's disease, hereditary 
spherocytosis, Marfan 1 s syndrome, von Willebrand's disease, 
neurofibromatosis, tuberous sclerosis, hereditary hemorrhagic 
telangiectasia, familial colonic polyposis, Ehlers-Danlos 
syndrome, myotonic dystrophy, muscular dystrophy, osteogenesis 
imperfecta, acute intermittent porphyria, and von Hippel- 
Lindau disease. 

The length of a reference sequence can vary widely from a 
full-length genome, to an individual chromosome, episome, 
gene, component of a gene, such as an exon, intron or 
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regulatory sequences, to a few nucleotides. A reference 
sequence of between about 2, 5, 10, 20, 50, 100, 5000, 1000, 
5,000 or 10,000, 20,000 or 100,000 nucleotides is common. 
Sometimes only particular regions of a sequence (e.g., exons 
5 of a gene) are of interest. In such situations, the 

particular regions can be considered as separate reference 
sequences or can be considered as components of a single 
reference sequence, as matter of arbitrary choice. 

A reference sequence can be any naturally occurring, 

10 mutant, consensus or purely hypothetical sequence of 

nucleotides, RNA or DNA. For example, sequences can be 
obtained from computer data bases, publications or can be 
determined or conceived de novo. Usually, a reference 
sequence is selected to show a high degree of sequence 

15 identity to envisaged target sequences. Often, particularly, 
where a significant degree of divergence is anticipated 
between target sequences, more than one reference sequence is 
selected. Combinations of wildtype and mutant reference 
sequences are employed in several applications of the tiling 

20 strategy. 

B . Chip Design 

1. Basic Tiling Strategy 

The basic tiling strategy provides an array of 
25 immobilized probes for analysis of target sequences showing a 
high degree of sequence identity to one or more selected 
reference sequences. The strategy is first illustrated for an 
array that is subdivided into four probe sets, although it 
will be apparent that in some situations, satisfactory results 
3 0 are obtained from only two probe sets. A first probe set 
comprises a plurality of probes exhibiting perfect 
complementarity with a selected reference sequence. The 
perfect complementarity usually exists throughout the length 
of the probe. However, probes having a segment or segments of 
35 perfect complementarity that is/are flanked by leading or 

trailing sequences lacking complementarity to the reference 
sequence can also be used. Within a segment of 
complementarity, each probe in the first probe set has at 
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least one interrogation position that corresponds to a 
nucleotide in the reference sequence. That is, the 
interrogation position is aligned with the corresponding 
nucleotide in the reference sequence , when the probe and 
5 reference, sequence are aligned to maximize complementarity 

between the two. If a probe has more than one interrogation 
position, each corresponds with a respective nucleotide in the 
reference sequence. The identity of an interrogation position 
and corresponding nucleotide in a particular probe in the 

10 first probe set cannot be determined simply by inspection of 
the probe in the first set. As will become apparent, an 
interrogation position and corresponding nucleotide is defined 
by the comparative structures of probes in the first probe set 
and corresponding probes from additional probe sets. 

15 In principle, a probe could have an interrogation 

position at each position in the segment complementary to the 
reference sequence. Sometimes, interrogation positions 
provide more accurate data when located away from the ends of 
a segment of complementarity. Thus, typically a probe having 

20 a segment of complementarity of length x does not contain more 
than x-2 interrogation positions. Since probes are* typically 
9-21 nucleotides, and usually all of a probe is complementary, 
a probe typically has 1-19 interrogation positions. Often the 
probes contain a single interrogation position, at or near the 

25 center of probe. 

For each probe in the first set, there are, for purposes 
of the present illustration, three corresponding probes from 
three additional probe sets. See Fig. 1. Thus, there are 
four probes corresponding to each nucleotide of interest in 

3 0 the reference sequence. Each of the four corresponding probes 
has an interrogation position aligned with that nucleotide of 
interest. Usually, the probes from the three additional 
probe sets are identical to the corresponding probe from the 
first probe set with one exception. The exception is that at 

35 least one (and often only one) interrogation position, which 
occurs in the same position in each of the four corresponding 
probes from the four probe sets, is occupied by a different 
nucleotide in the four probe sets. For example, for an A 
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nucleotide in the reference sequence, the corresponding probe 
from the first probe set has its interrogation position 
occupied by a T, and the corresponding probes from the 
additional three probe sets have their respective 
5 interrogation positions occupied by A, C, or G, a different 
nucleotide in each probe. Of course, if a probe from the 
first probe set comprises trailing or flanking sequences 
lacking complementarity to the reference sequences (see 
Fig. 2), these sequences need not be present in corresponding 

10 probes from the three additional sets. Likewise corresponding 
probes from the three additional sets can contain leading or 
trailing sequences outside the segment of complementarity that 
are not present in the corresponding probe from the first 
probe set. Occasionally, the probes from the additional three 

15 probe set are identical (with the exception of interrogation 
position(s)) to a contiguous subsequence of the full 
complementary segment of the corresponding probe from the 
first probe set. In this case, the subsequence includes the 
interrogation position and usually differs from the full- 

20 length probe only in the omission of one or both terminal 

nucleotides from the termini of a segment of complementarity. 
That is, if a probe from the first probe set has a segment of 
complementarity of length n, corresponding probes from the 
other sets will usually include a subsequence of the segment 

25 of at least length n-2. Thus, the subsequence is usually at 
least 3, 4, 7, 9, 15, 21, or 25 nucleotides long, most 
typically, in the range of 9-21 nucleotides. The subsequence 
should be sufficiently long to allow a probe to hybridize 
detectably more strongly to a variant of the reference 

3 0 sequence mutated at the interrogation position than to the 
reference sequence. 

The probes can be oligodeoxyribonucleotides or 
oligoribonucleotides, or any modified forms of these polymers 
that are capable of hybridizing with a target nucleic sequence 

35 by complementary base-pairing. Complementary base pairing 
means sequence-specific base pairing which includes e.g., 
Watson-Crick base pairing as well as other forms of base 
pairing such as Hoogsteen base pairing. Modified forms 
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include 2 , -0-methyl oligoribonucleotides and so-called PNAs , 
in which oligodeoxyribonucleotides are linked via peptide 
bonds rather than phophodiester bonds. The probes can be 
attached by any linkage to a support (e.g., 3', 5 f or via the 
5 base) . 3 1 attachment is more usual as this orientation is 
compatible with the preferred chemistry for solid phase 
synthesis of oligonucleotides. 

The number of probes in the first probe set (and as a 
consequence the number of probes in additional probe sets) 

10 depends on the length of the reference sequence, the number of 
nucleotides of interest in the reference sequence and the 
number of interrogation positions per probe. In general, each 
nucleotide of interest in the reference sequence requires the 
same interrogation position in the four sets of probes. 

15 Consider, as an example, a reference sequence of 100 

nucleotides, 50 of which are of interest, and probes each 
having a single interrogation position. In this situation, 
the first probe set requires fifty probes, each having one 
interrogation position corresponding to a nucleotide of 

20 interest in the reference sequence. The second, third and 
fourth probe sets each have a corresponding probe for each 
probe in the first probe set, and so each also contains a 
total of fifty probes. The identity of each nucleotide of 
interest in the reference sequence is determined by comparing 

25 the relative hybridization signals at four probes having 

interrogation positions corresponding to that nucleotide from 
the four probe sets. 

In some reference sequences, every nucleotide is of 
interest. In other reference sequences, only certain portions 

30 in which variants (e.gr., mutations or polymorphisms) are 

concentrated are of interest. In other reference sequences, 
only particular mutations or polymorphisms and immediately 
adjacent nucleotides are of interest. Usually, the first 
probe set has interrogation positions selected to correspond 

35 to at least a nucleotide (e.g., representing a point mutation) 
and one immediately adjacent nucleotide. Usually, the probes 
in the first set have interrogation positions corresponding to 
at least 3, 10, 50, 100, 1000, or 20,000 contiguous 
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nucleotides. The probes usually have interrogation positions 
corresponding to at least 5, 10, 30, 50, 75, 90, 99 or 
sometimes 100% of the nucleotides in a reference sequence. 
Frequently, the probes in the first probe set completely span 
5 the reference sequence and overlap with one another relative 
to the reference sequence. For example, in one common 
arrangement each probe in the first probe set differs from 
another probe in that set by the omission of a 3 1 base 
complementary to the reference sequence and the acquisition of 
10 a 5 1 base complementary to the reference sequence. See 
Fig. 3. 

For conceptual simplicity, the probes in a set are 
usually arranged in order of the sequence in a lane across the 
chip. A lane contains a series of overlapping probes, which 

15 represent or tile across, the selected reference sequence (see 
Fig. 3) . The components of the four sets of probes are 
usually laid down in four parallel lanes, collectively 
constituting a row in the horizontal direction and a series of 
4 -member columns in the vertical direction. Corresponding 

20 probes from the four probe sets (i.e., complementary to the 
same subsequence of the reference sequence) occupy a column. 
Each probe in a lane usually differs from its predecessor in 
the lane by the omission of a base at one end and the 
inclusion of additional base at the other end as shown in 

25 Fig. 3. However, this orderly progression of probes can be 

interrupted by the inclusion of control probes or omission of 
probes in certain columns of the array. Such columns serve as 
controls to orient the chip, or gauge the background, which 
can include target sequence nonspecif ically bound to the chip. 

30 The probes sets are usually laid down in lanes such that 

all probes having an interrogation position occupied by an A 
form an-A-lane, all probes having an interrogation position 
occupied by a C form a C-lane, all probes having an 
interrogation position occupied by a G form a G-lane, and all 

35 probes having an interrogation position occupied by a T (or U) 
form a T lane (or a U lane) . Note that in this arrangement 
there is not a unique correspondence between probe sets and 
lanes. Thus, the probe from the first probe set is laid down 
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in the A-lane, C-lane, A-lane, A-lane and T-lane for the five 
columns in Fig. 4. The interrogation position on a column of 
probes corresponds to the position in the target sequence 
whose identity is determined from analysis of hybridization to 
5 the probes in that column. Thus, I^Is respectively 

correspond to N-^-^ in Fig. 4. The interrogation position can 
be anywhere in a probe but is usually at or near the central 
position of the probe to maximize differential hybridization 
signals between a perfect match and a single-base mismatch. 

10 For example, for an 11 mer probe, the central position is the 
sixth nucleotide. 

Although the array of probes is usually laid down in rows 
and columns as described above, such a physical arrangement of 
probes on the chip is not essential. Provided that the 

15 spatial location of each probe in an array is known, the data 
from the probes can be collected and processed to yield the 
sequence of a target irrespective of the physical arrangement 
of the probes on a chip. In processing the data, the 
hybridization signals from the respective probes can be 

20 reassorted into any conceptual array desired for subsequent 

data reduction whatever the physical arrangement of probes on 
the chip. 

A range of lengths of probes can be employed in the 
chips. As noted above, a probe may consist exclusively of a 

25 complementary segments, or may have one or more complementary 
segments juxtaposed by flanking, trailing and/or intervening 
segments. In the latter situation, the total length of 
complementary segment (s) is more important that the length of 
the probe. In functional terms, the complementarity 

30 segment (s) of the first probe sets should be sufficiently long 
to allow the probe to hybridize detectably more strongly to a 
reference sequence compared with a variant of the reference 
including a single base mutation at the nucleotide 
corresponding to the interrogation position of the probe. 

35 Similarly, the complementarity segment (s) in corresponding 

probes from additional probe sets should be sufficiently long 
to allow a probe to hybridize detectably more strongly to a 
variant of the reference sequence having a single nucleotide 
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substitution at the interrogation position relative to the 
reference sequence. A probe usually has a single 
complementary segment having a length of at least 
3 nucleotides, and more usually at least 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 
30 bases exhibiting perfect complementarity (other than 
possibly at the interrogation position (s) depending on the 
probe set) to the reference sequence. In bridging strategies, 
where more than one segment of complementarity is present, 
each segment provides at least three complementary nucleotides 
to the reference sequence and the combined segments provide at 
least two segments of three or a total of six complementary 
nucleotides. As in the other strategies, the combined length 
of complementary segments is typically from 6-3 0 nucleotides, 
and preferably from about 9-21 nucleotides. The two segments 
are often approximately the same length. Often, the probes 
(or segment of complementarity within probes) have an odd 
number of bases, so that an interrogation position can occur 
in the exact center of the probe. 

In some chips, all probes are the same length. Other 
chips employ different groups. of probe sets, in which case the 
probes are of the same size within a group, but differ between 
different groups. For example, some chips have one group 
comprising four sets of probes as described above in which all 
the probes are 11 mers, together with a second group 
comprising four sets of probes in which all of the probes are 
13 mers. Of course, additional groups of probes can be added. 
Thus, some chips contain, e.g., four groups of probes having 
sizes of 11 mers, 13 mers, 15 mers and 17 mers. Other chips 
have different size probes within the same group of four probe 
sets. In these chips, the probes in the first set can vary in 
length independently of each other. Probes in the other sets 
are usually the same length as the probe occupying the same 
column from the first set. However, occasionally different 
lengths of probes can be included at the same column position 
in the four lanes. The different length probes are included 
to equalize hybridization signals from probes irrespective of 
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whether A-T or C-G bonds are formed at the interrogation 
position. 

The length of probe can be important in distinguishing 
between a perfectly matched probe and probes showing a single- 
base mismatch with the target sequence. The discrimination is 
usually greater for short probes. Shorter probes are usually 
also less susceptible to formation of secondary structures. 
However, the absolute amount of target sequence bound, and 
hence the signal, is greater for larger probes. The probe 
length representing the optimum compromise between these 
competing considerations may vary depending on inter alia the 
GC content of a particular region of the target DNA sequence, 
secondary structure, synthesis efficiency and cross- 
hybridization. In some regions of the target, depending on 
hybridization conditions, short probes (e.g., 11 mers) may 
provide information that is inaccessible from longer probes 
(e.g., 19 mers) and vice versa. Maximum sequence information 
can be read by including several groups of different sized 
probes on the chip as noted above. However, for many regions 
of the target sequence, such a strategy provides redundant 
information in that the same sequence is read multiple times 
from the different groups of probes. . Equivalent information 
can be obtained from a single group of different sized probes 
in which the sizes are selected to maximize readable sequence 
at particular regions of the target sequence. The appropriate 
size of probes at different regions of the target sequence can 
be determined from, e.g., Fig. 12, which compares the 
readability of different sized probes in different regions of 
a target. The strategy of customizing probe length within a m 
single group of probe sets minimizes the total number of 
probes required to read a particular target sequence. This 
leaves ample capacity for the chip to include probes to other 
reference sequences. 

The invention provides an optimization block which allows 
systematic variation of probe length and interrogation 
position to optimize the selection of probes for analyzing a 
particular nucleotide in a reference sequence. The block 
comprises alternating columns of probes complementary to the 
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wildtype target and probes complementary to a specific 
mutation. The interrogation position is varied between 
columns and probe length is varied down a column. 
Hybridization of the chip to the reference sequence or the 
mutant form of the reference sequence identifies the probe 
length and interrogation position providing the greatest 
differential hybridization signal. 

The probes are designed to be complementary to either 
strand of the reference sequence (e.g., coding or non-coding). 
Some chips contain separate groups of probes, one 
complementary to the coding strand, the other complementary to 
the noncoding strand. Independent analysis of coding and 
noncoding strands provides largely redundant information. 
However, the regions of ambiguity in reading the coding strand 
are not always the same as those in reading the noncoding 
strand. Thus, combination of the information from coding and 
noncoding strands increases the overall accuracy of 
sequencing. 

Some chips contain additional probes or groups of probes 
designed to be complementary to a second reference sequence. 
The second reference sequence is often a subsequence of the 
first reference sequence bearing one or more commonly 
occurring mutations or interstrain variations. The second 
group of probes is designed by the same principles as 
described above except that the probes exhibit complementarity 
to the second reference sequence. The inclusion of a second 
group is particular useful for analyzing short subsequences of 
the primary reference sequence in which multiple mutations are 
expected to occur within a short distance commensurate with 
the length of the probes (i.e., two or more mutations within 9 
to 21 bases) . Of course, the same principle can be extended 
to provide chips containing groups of probes for any number of 
reference sequences. Alternatively, the chips may contain 
additional probe (s) that do not form part of a tiled array as 
noted above, but rather serves as probe (s) for a conventional 
reverse dot blot. For example, the presence of mutation can 
be detected from binding of a target sequence to a single 
oligomeric probe harboring the mutation. Preferably, an 
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additional probe containing the equivalent region of the 
wildtype sequence is included as a control. 

The chips are read by comparing the intensities of 
labelled target bound to the probes in an array. 
5 Specifically, a comparison is performed between each lane of 
probes (e.g., A, C, G and T lanes) at each columnar position 
(physical or conceptual) . For a particular columnar position, 
the lane showing the greatest hybridization signal is called 
as the nucleotide present at the position in the target 

10 sequence corresponding to the interrogation position in the 

probes. See Fig. 5. The corresponding position in the target 
sequence is that aligned with the interrogation position in 
corresponding probes when the probes and target are aligned to 
maximize complementarity. Of the four probes in a column, 

15 only one can exhibit a perfect match to the target sequence 
whereas the others usually exhibit at least a one base pair 
mismatch. The probe exhibiting a perfect match usually 
produces a substantially greater hybridization signal than the 
other three probes in the column and is thereby easily 

20 identified. However, in some regions of the target sequence, 
the distinction between a perfect match and a one-base 
mismatch is less clear. Thus, a call ratio is established to 
define the ratio of signal from the best hybridizing probes to 
the second best hybridizing probe that must be exceeded for a 

25 particular target position to be read from the probes. A high 
call ratio ensures that few if any errors are made in calling 
target nucleotides, but can result in some nucleotides being 
scored as ambiguous, which could in fact be accurately read. 
A lower call ratio results in fewer ambiguous calls, but can 

3 0 result in more erroneous calls. It has been found that at a 
call ratio of 1.2 virtually all calls are accurate. However, 
a small but significant number of bases (e.g., up to about 
10%) may have to be scored as ambiguous. 

Although small regions of the target sequence can 

35 sometimes be ambiguous, these regions usually occur at the 

same or similar segments in different target sequences. Thus, 
for precharacterized mutations, it is known in advance whether 
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that mutation is likely to occur within a region of 
unambiguously determinable sequence. 

An array of probes is most useful for analyzing the 
reference sequence from which the probes were designed and 
5 variants of that sequence exhibiting substantial sequence 

similarity with the reference sequence (e.g., several single- 
base mutants spaced over the reference sequence) . When an 
array is used to analyze the exact reference sequence from 
which it was designed, one probe exhibits a perfect match to 

10 the reference sequence, and the other three probes in the same 
column exhibits single-base mismatches. Thus, discrimination 
between hybridization signals is usually high and accurate 
sequence is obtained. High accuracy is also obtained when an 
array is used for analyzing a target sequence comprising a 

"15 variant of the reference sequence that has a single mutation 
relative to the reference sequence, or several widely spaced 
mutations relative to the reference sequence. At different 
mutant loci, one probe exhibits a perfect match to the target, 
and the other three probes occupying the same column exhibit 

20 single-base mismatches, the difference (with respect to ■ 

analysis of the reference sequence) being the lane in which 
the perfect match occurs. 

For target sequences showing a high degree of divergence 
from the reference strain or incorporating several closely 

25 spaced mutations from the reference strain, a single group of 
probes (i.e., designed with respect to a single reference 
sequence) will not always provide accurate sequence for the 
highly variant region of this sequence. At some particular 
columnar positions, it may be that no single probe exhibits 

30 perfect complementarity to the target and that any comparison 
must be based on different degrees of mismatch between the 
four probes. Such a comparison does not always allow the 
target nucleotide corresponding to that columnar position to 
be called. Deletions in target sequences can be detected by 

35 loss of signal from probes having interrogation positions 

encompassed by the deletion. However, signal may also be lost 
from probes having interrogation positions closely proximal to 
the deletion resulting in some regions of the target sequence 
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that cannot be read. Target sequence bearing insertions will 
also exhibit short regions including and proximal to the 
insertion that usually cannot be read. 

The presence of short regions of difficult-to-read target 
5 because of closely spaced mutations, insertions or deletion, 
does not prevent determination of the remaining sequence of 
the target as different regions of a target sequence are 
determined independently. Moreover, such ambiguities as might 
result from analysis of diverse variants with a single group 

10 of probes can be avoided by including multiple groups of probe 
sets on a chip. For example, one group of probes can be 
designed based on a full-length reference sequence, and the 
other groups on subsequences of the reference sequence 
incorporating frequently occurring mutations or strain 

15 variations. 

A particular advantage of the present sequencing strategy 
over conventional sequencing methods is the capacity 
simultaneously to detect and quantify proportions of multiple 
target sequences. Such capacity is valuable, e.g., for 

20 diagnosis of patients who are heterozygous with respect to a 
gene or who are infected with a virus, such as HIV, which is 
usually present in several polymorphic forms. Such capacity 
is also useful in analyzing targets from biopsies of tumor 
cells and surrounding tissues. The presence of multiple 

25 target sequences is detected from the relative signals of the 
four probes at the array columns corresponding to the target 
nucleotides at which diversity occurs. The relative signals 
at the four probes for the mixture under test are compared 
with the corresponding signals from a homogeneous reference 

30 sequence. An increase in a signal from a probe that is 
mismatched with respect to the reference sequence, and a 
corresponding decrease in the signal from the probe which is 
matched with the reference sequence signal the presence of a 
mutant strain in the mixture. The extent in shift in 

35 hybridization signals of the probes is related to the 

proportion of a target sequence in the mixture. Shifts in 
relative hybridization signals can be quantitatively related 
to proportions of reference and mutant sequence by prior 
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calibration of the chip with seeded mixtures of the mutant and 
reference sequences. By this means, a chip can be used to 
detect variant or mutant strains constituting as little as 1, 
5, 20, or 25 % of a mixture of stains. 
5 Similar principles allow the simultaneous analysis of 

multiple target sequences even when none is identical to the 
reference sequence. For example , with a mixture of two target 
sequences bearing first and second mutations, there would be a 
variation in the hybridization patterns of probes having 

10 interrogation positions corresponding to the first and second 
mutations relative to the hybridization pattern with the 
reference sequence. At each position, one of the probes 
having a mismatched interrogation position relative to the 
reference sequence would show an increase in hybridization 

15 signal, and the probe having a matched interrogation position 
relative to the reference sequence would show a decrease in . 
hybridization signal. Analysis of the hybridization pattern 
of the mixture of mutant target sequences, preferably in 
comparison with the hybridization pattern of the reference 

20 sequence, indicates the presence of two mutant target 

sequences, the position and nature of the mutation in each 
strain, and the relative proportions of each strain. 

In a variation of the above method, the different 
components in a mixture of target sequences are differentially 

25 labelled before being applied to the array. For example, a 

variety of fluorescent labels emitting at different wavelength 
are available. The use of differential labels allows 
independent analysis of different targets bound simultaneously 
to the array. For example, the methods permit comparison of 

3 0 target sequences obtained from a patient at different stages 
of a disease. 

2. Omission of Probes 
The general strategy outlined above employs four probes 
3 5 to read each nucleotide of interest in a target sequence. One 
probe (from the first probe set) shows a perfect match to the 
reference sequence and the other three probes (from the 
second, third and fourth probe sets) exhibit a mismatch with 
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the reference sequence and a perfect match with a target 
sequence bearing a mutation at the nucleotide of interest. 
The provision of three probes from the second, third and 
fourth probe sets allows detection of each of the three 
5 possible nucleotide substitutions of any nucleotide of 

interest- However, in some reference sequences or regions of 
reference sequences, it is known in advance that only certain 
mutations are likely to occur. Thus, for example, at one site 
it might be known that an A nucleotide in the reference 

10 sequence may exist as a T mutant in some target sequences but 
is unlikely to exist as a C or G mutant. Accordingly, for 
analysis of this region of the reference sequence, one might 
include only the first and second probe sets, the first probe 
set exhibiting perfect complementarity to the reference 

15 sequence, and the second probe set having an interrogation 

position occupied by an invariant A residue (for detecting the 
T mutant) . In other situations, one might include the first, 
second and third probes sets (but not the fourth) for 
detection of a wildtype nucleotide in the reference sequence 

20 and two mutant variants thereof in target sequences. In some 
chips, probes that would detect silent mutations (i.e., not 
affecting amino acid sequence) are omitted. 

In some chips, the probes from the first probe set are 
omitted corresponding to some or all positions of the 

25 reference sequences. Such chips comprise at least two probe 
sets. The first probe set has a plurality of probes. Each 
probe comprises a segment exactly complementary to a 
subsequence of a reference sequence except in at least one 
interrogation position. A second probe set has a 

30 corresponding probe for each probe in the first probe set. 

The corresponding probe in the second probe set is identical 
to a sequence comprising the corresponding probe form the 
first probe set or a subsequence thereof that includes the at 
least one (and usually only one) interrogation position except 

35 that the at least one interrogation position is occupied by a 
different nucleotide in each of the two corresponding probes 
from the first and second probe sets. A third probe set, if 
present, also comprises a corresponding probe for each probe 
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in the first probe set except at the at least one 
interrogation position, which differs in the corresponding 
probes from the three sets. Omission of probes having a 
segment exhibiting perfect complementarity to the reference 
5 sequence results in loss of control information, i.e., the 
detection of nucleotides in a target sequence that are the 
same as those in a reference sequence. However, similar 
information can be obtained by hybridizing a chip lacking 
probes from the first probe set to both target and reference 

10 sequences. The hybridization can be performed sequentially, 
or concurrently, if the target and reference are 
differentially labelled. In this situation, the presence of a 
mutation is detected by a shift in the background 
hybridization intensity of the reference sequence to a 

15 perfectly matched hybridization signal of the target sequence, 
rather than by a comparison of the hybridization intensities 
of probes from the first set with corresponding probes from 
the second, third and fourth sets. 

20 3. Wildtype Probe Lane 

When the chips comprise four probe sets, as discussed 
supra, and the probe sets are laid down in four lanes, an A 
lane, a Olane, a G lane and a T or U lane, the probe having a 
segment exhibiting perfect complementarity to a reference 

25 sequence varies between the four lanes from one column to 

another. This does not present any significant difficulty in 
computer analysis of the data from the chip. However, visual 
inspection of the hybridization pattern of the chip is 
sometimes facilitated by provision of an extra lane of probes, 

30 in which each probe has a segment exhibiting perfect 

complementarity to the reference sequence. See Fig. 4. This 
segment -is identical to a segment from one of the probes in 
the other four lanes (which lane depending on the column 
position) . The extra lane of probes (designated the wildtype 

35 lane) hybridizes to a target sequence at all nucleotide 

positions except those in which deviations from the reference 
sequence occurs. The hybridization pattern of the wildtype 
lane thereby provides a simple visual indication of mutations. 
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4. Deletion, Insertion and Multiple-Mutation Probes 
Some chips provide an additional probe set specifically 
designed for analyzing deletion mutations. The additional 
probe set comprises a probe corresponding to each probe in the 
5 first probe set as described above. However, a probe from the 
additional probe set differs from the corresponding probe in 
the first probe set in that the nucleotide occupying the 
interrogation position is deleted in the probe from the 
additional probe set. See Fig. 6. Optionally, the probe from 

10 the additional probe set bears an additional nucleotide at one 
of its termini relative to the corresponding probe from the 
first probe set. The probe from the additional probe set will 
hybridize more strongly than the corresponding probe from the 
first probe set to a target sequence having a single base 

15 deletion at the nucleotide corresponding to the interrogation 
position. Additional probe sets are provided in which not 
only the interrogation position, but also an adjacent 
nucleotide is detected. 

Similarly, other chips provide additional probe sets for 

20 analyzing insertions. For example, one additional probe set 

has a probe corresponding to each probe in the first probe set 
as described above. However, the probe in the additional 
probe set has an extra T nucleotide inserted adjacent to the 
interrogation position. See Fig. 6. Optionally, the probe 

25 has one fewer nucleotide at one of its termini relative to the 
corresponding probe from the first probe set. The probe from 
the additional probe set hybridizes more strongly than the 
corresponding probe from the first probe set to a target 
sequence having an A nucleotide inserted in a position 

30 adjacent to that corresponding to the interrogation position. 
Similar additional probe sets are constructed having C, G or 
T/U nucleotides inserted adjacent to the interrogation 
position. Usually, four such probe sets, one for each 
nucleotide, are used in combination. 

35 Other chips provide additional probes (multiple-mutation 

probes) for analyzing target sequences having multiple closely 
spaced mutations. A multiple-mutation probe is usually 
identical to a corresponding probe from the first set as 
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described above, except in the base occupying the 
interrogation position, and except at one or more additional 
positions, corresponding to nucleotides in which substitution 
may occur in the reference sequence. The one or more 
5 additional positions in the multiple mutation probe are 
occupied by nucleotides complementary to the nucleotides 
occupying corresponding positions in the reference sequence 
when the possible substitutions have occurred. 
5. Block Tiling 

10 As noted in the discussion of the general tiling 

strategy, a probe in the first probe set sometimes has more 
than one interrogation position. In this situation, a probe 
in the first probe set is sometimes matched with multiple 
groups of at least one, and usually, three additional probe 

15 sets. See Fig. 7. Three additional probe sets are used to 

allow detection of the three possible nucleotide substitutions 
at any one position. If only certain types of substitution 
are likely to occur (e.g., transitions), only one or two 
additional probe sets are required (analogous to the use of 

20 probes in the basic tiling strategy) . To illustrate for the 

situation where a group comprises three additional probe sets, 
a first such group comprises second, third and fourth probe 
sets, each of which has a probe corresponding to each probe in 
the first probe set. The corresponding probes from the 

25 second, third and fourth probes sets differ from the 

corresponding probe in the first set at a first of the 
interrogation positions. Thus, the relative hybridization 
signals from corresponding probes from the first, second, 
third and fourth probe sets indicate the identity of the 

30 nucleotide in a target sequence corresponding to the first 
interrogation position. A second group of three probe sets 
(designated fifth, sixth and seventh probe sets) , each also 
have a probe corresponding to each probe in the first probe 
set. These corresponding probes differ from that in the first 

35 probe set at a second interrogation position. The relative 
hybridization signals from corresponding probes from the 
first, fifth, sixth, and seventh probe sets indicate the 
identity of the nucleotide in the target sequence 



WO 95/11995 PCT/US94/12305 

38 

corresponding to the second interrogation position. As noted 
above, the probes in the first probe set often have seven or 
more interrogation positions. If there are seven 
interrogation positions, there are seven groups of three 
5 additional probe sets, each group of three probe sets serving 
to identify the nucleotide corresponding to one of the seven 
interrogation positions. 

Each block of probes allows short regions of a target 
sequence to be read. For example, for a block of probes 

10 having seven interrogation positions, seven nucleotides in the 
target sequence can be read. Of course, a chip can contain 
any number of blocks depending on how many nucleotides of the 
target are of interest. The hybridization signals for each 
block can be analyzed independently of any other block. The 

15 block tiling strategy can also be combined with other tiling 
strategies, with different parts of the same reference 
sequence being tiled by different strategies. 

The block tiling strategy offers two advantages over the 
basic strategy in which each probe in the first set has a 

20 single interrogation position. One advantage is that the same 
sequence information can be obtained from fewer probes. A 
second advantage is that each of the probes constituting a 
block (i.e., a probe from the first probe set and a 
corresponding probe from each of the other probe sets) can 

25 have identical 3 1 and 5 1 sequences, with the variation 

confined to a central segment containing the interrogation 
positions. The identity of 3 1 sequence between different 
probes simplifies the strategy for solid phase synthesis of 
the probes on the chip and results in more uniform deposition 

30 of the different probes on the chip, thereby in turn 

increasing the uniformity of signal to noise ratio for 
different regions of the chip. A third advantage is that 
greater signal uniformity is achieved within a block. 

35 6. Multiplex Tiling 

In the block tiling strategy discussed above, the 
identity of a nucleotide in a target or reference sequence is 
determined by comparison of hybridization patterns of one 
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probe having a segment showing a perfect match with that of 
other probes (usually three other probes) showing a single 
base mismatch. In multiplex tiling, the identity of at least 
two nucleotides in a reference or target sequence is 
determined by comparison of hybridization signal intensities 
of four probes, two of which have a segment showing perfect 
complementarity or a single base mismatch' to the reference 
sequence, and two of which have a segment showing perfect 
complementarity or a double-base mismatch to a segment. The 
four probes whose hybridization patterns are to be compared 
each have a segment that is exactly complementary to a 
reference sequence except at two interrogation positions, in 
which the segment may or may not be complementary to the 
reference sequence. The interrogation positions correspond to 
the nucleotides in a reference or target sequence which are 
determined by the comparison of intensities. The nucleotides 
occupying the interrogation positions in the four probes are 
selected according to the following rule. The first 
interrogation position is occupied by a different nucleotide 
in each of the four probes. The second interrogation position 
is also occupied by a different nucleotide in each of the four 
probes. In two of the four probes, designated the first and 
second probes, the segment is exactly complementary to the 
reference sequence except at not more than one of the two 
interrogation positions. In other words, one of the 
interrogation positions is occupied by a nucleotide that is 
complementary to the corresponding nucleotide from the 
reference sequence and the other interrogation position may or 
may not be so occupied. In the other two of the four probes, 
designated the third and fourth probes, the segment is exactly 
complementary to the reference sequence except that both 
interrogation positions are occupied by nucleotides which are 
noncomplementary to the respective corresponding nucleotides 
in the reference sequence. 

There are number of ways of satisfying these conditions 
depending on whether the two nucleotides in the reference 
sequence corresponding to the two interrogation positions are 
the same or different. If these two nucleotides are different 
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in the reference sequence (probability 3/4) , the conditions 
are satisfied by each of the two interrogation positions being 
occupied by the same nucleotide in any given probe. For 
example, in the first probe, the two interrogation positions 
would both be A, in the second probe, both would be C, in the 
third probe, each would be G, and in the fourth probe each 
would be T or U. If the two nucleotides in the reference 
sequence corresponding to the two interrogation positions are 
different, the conditions noted above are satisfied by each of 
the interrogation positions in any one of the four probes 
being occupied by complementary nucleotides. For example, in 
the first probe, the interrogation positions could be occupied 
by A and T, in the second probe by C and G, in the third probe 
by G and C, and in the four probe, by T and A. See (Fig. 8) . 

When the four probes are hybridized to a target that is 
the same as the reference sequence or differs from the 
reference sequence at one (but not both) of the interrogation 
positions, two of the four probes show a double-mismatch with 
the target and two probes show a single mismatch • The 
identity of probes showing these different degrees of mismatch 
can be determined from the different hybridization signals. 
From the identity of the probes showing the different degrees 
of mismatch, the nucleotides occupying both of the 
interrogation positions in the target sequence can be deduced. 

For ease of illustration, the multiplex strategy has been 
initially described for the situation where there are two 
nucleotides of interest in a reference sequence and only four 
probes in an array. Of course, the strategy can be extended 
to analyze any number of nucleotides in a target sequence by 
using additional probes. In one variation, each pair of 
interrogation positions is read from a unique group of four 
probes. In a block variation, different groups of four probes 
exhibit the same segment of complementarity with the reference 
sequence, but the interrogation positions move within a block. 
The block and standard multiplex tiling variants can of course 
be used in combination for different regions of a reference 
sequence. Either or both variants can also be used in 
combination with any of the other tiling strategies described. 
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1^ Helper Mutations 

Occasionally small regions of a reference sequence give a 
low hybridization signal as a result of annealing of probes. 
The self -annealing reduces the amount of probe effectively 
i 5 available for hybridizing to the target. Although such 

regions of the target are generally small and the reduction of 
hybridization signal is usually not so substantial as to 
obscure the sequence of this region, this concern can be 
avoided by the use of probes incorporating helper mutations. 

10 The helper mutation (s) serve to break-up regions of internal 
complementarity within a probe and thereby prevent annealing. 
Usually, one or two helper mutations are quite sufficient for 
this purpose. The inclusion of helper mutations can be 
beneficial in any of the tiling strategies noted above. In 

15 general each probe having a particular interrogation position 
has the same helper mutation (s) . Thus, such probes have a 
segment in common which shows perfect complementarity with a 
reference sequence, except that the segment contains at least 
one helper mutation (the same in each of the probes) and at 

2 0 least one interrogation position (different in all of the 

probes) . For example, in the basic tiling strategy, a probe 
from the first probe set comprises a segment containing an 
interrogation position and showing perfect complementarity 
with a reference sequence except for one or two helper 

25 mutations. The corresponding probes from the second, third 
and fourth probe sets usually comprise the same segment (or 
sometimes a subsequence thereof including the helper 
mutation (s) and interrogation position) , except that the base 
occupying the interrogation position varies in each probe. 

30 See Fig, 9. 

Usually, the helper mutation tiling strategy is used in 
conjunction with one of the tiling strategies described above. 
( The probes containing helper mutations are used to tile 

regions of a reference sequence otherwise giving low 

35 hybridization signal (e.g., because of self-complementarity), 
and the alternative tiling strategy is used to tile 
intervening regions. 
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8. Pooling Strategies 

Pooling strategies also employ arrays of immobilized 
probes. Probes are immobilized in cells of an array, and the 
hybridization signal of each cell can be determined 
5 independently of any other cell. A particular cell may be 

occupied by pooled mixture of probes. Although the identity 
of each probe in the mixture is known, the individual probes 
in the pool are not separately addressable. Thus, the 
hybridization signal from a cell is the aggregate of that of 

10 the different probes occupying the cell- In general, a cell 
is scored as hybridizing to a target sequence if at least one 
probe occupying the cell comprises a segment exhibiting 
perfect complementarity to the target sequence. 

A simple strategy to show the increased power of pooled 

15 strategies over a standard tiling is to create three cells 
each containing a pooled probe having a single pooled 
position, the pooled position being the same in each of the 
pooled probes. At the pooled position, there are two possible 
nucleotide, allowing the pooled probe to hybridize to two 

20 target sequences. In tiling terminology, the pooled position 
of each probe is an interrogation position. As will become 
apparent, comparison of the hybridization intensities of the 
pooled probes from the three cells reveals the identity of the 
nucleotide in the target sequence corresponding to the 

25 interrogation position (i.e., that is matched with the 

interrogation position when the target sequence and pooled 
probes are maximally aligned for complementarity) . 

The three cells are assigned probe pools that are 
perfectly complementary to the target except at the pooled 

30 position, which is occupied by a different pooled nucleotide 
in each probe as follows: 
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[AC] - M, [GT]=K, [ AG ] =R 

as substitutions in the probe 

IUPAC standard ambiguity notation) 

X - interrogation position 
Target : TAACCACTCACGGGAGCA 

Pool 1: ATTGGMGAGTGCCC 

=ATTGGaGAGTGCCC (complement to mutant 1 1 1 ) 

+ATTGGcGAGTGCCC (complement to mutant •g 1 ) 

Pool 2: ATTGGKGAGTGCCC 

=ATTGGgGAGTGCCC (complement to mutant 1 c') 

+ATTGGtGAGTGCCC (complement to wild type 'a 1 ) 

15 Pool 3: ATTGGRGAGTGCCC 

=ATTGGaGAGTGCCC (complement to mutant 't') 

+ATTGGgGAGTGCCC (complement to mutant 1 c') 

20 With 3 pooled probes, all 4 possible single base pair states 
(wild and 3 mutants) are detected. A pool hybridizes with a 
target if some probe contained within that pool is 
complementary to that target. 

25 Hybridization? 

Pool: 12 3 

Target: TAACCACTCACGGGAGCA n y n 

Mutant: TAACCcCTCACGGGAGCA n y y 

Mutant: TAACCgCTCACGGGAGCA y n n 

3 0 Mutant : TAACCtCTCACGGGAGCA y n y 

A cell containing a pair (or more) of oligonucleotides 
lights up when a target complementary to any of the 
oligonucleotide in the cell is present. Using the simple 
35 strategy, each of the four possible targets (wild and three 

mutants) yields a unigue hybridization pattern among the three 
cells. 

Since a different pattern of hybridizing pools is 
obtained, for each possible nucleotide in the target sequence 

40 corresponding to the pooled interrogation position in the l. 

probes, the identity of the nucleotide can be determined from 
the hybridization pattern of the pools. Whereas, a standard 
tiling requires four cells to detect and identify the possible 
single-base substitutions at one location, this simple pooled 

45 strategy only requires three cells. 
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A more efficient pooling strategy for sequence analysis 
is the trellis' strategy. In this strategy, each pooled 
probe has a segment of perfect complementarity to a reference 
sequence except at three pooled positions. One pooled 
5 position is an N pool. The three pooled positions may or may 
not be contiguous in a probe. The other two pooled positions 
are selected from the group of three pools consisting of (1) M 
or K, (2) R or Y and (3) W or S, where the single letters are 
IUPAC standard ambiguity codes. The sequence of a pooled 

10 probe is thus, of the form XXXN[ (M/K) or (R/Y) or (W/S) ] [ (M/K) 
or (R/Y) or (W/S) ]XXXXX, where XXX represents bases 
complementary to the reference sequence. The three pooled 
positions may be in any order, and may be contiguous or 
separated by intervening nucleotides. For, the two positions 

15 occupied by [ (M/K) or (R/Y) or (W/S) ] , two choices must be 

made. First, one must select one of the following three pairs 
of pooled nucleotides (1) M/K, (2) R/Y and (3) W/S. The one 
of three pooled nucleotides selected may be the same or 
different at the two pooled positions. Second, supposing, for 

20 example, one selects M/K at one position, one must then chose 
between M or K. This choice should result in selection of a 
pooled nucleotide comprising a nucleotide that complements the 
corresponding nucleotide in a reference sequence, when the 
probe and reference sequence are maximally aligned. The same 

25 principle governs the selection between R and Y, and between W 
and S. A trellis pool probe has one pooled position with four 
possibilities, and two pooled positions, each with two 
possibilities. Thus, a trellis pool probe comprises a mixture 
of 16 (4x2x2) probes. Since each pooled position includes 

3 0 one nucleotide that complements the corresponding nucleotide 
from the reference sequence, one of these 16 probes has a 
segment that is the exact complement of the reference 
sequence. A target sequence that is the same as the reference 
sequence (i.e., a wildtype target) gives a hybridization 

35 signal to each probe cell. Here, as in other tiling methods, 
the segment of complementarity should be sufficiently long to 
permit specific hybridization of a pooled probe to a reference 
sequence be detected relative to a variant of that reference 
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sequence. Typically, the segment of complementarity is about 
9-21 nucleotides. 

A target sequence is analyzed by comparing hybridization 
intensities at three pooled probes, each having the structure 
5 described above. The segments complementary to the reference 
sequence present in the three pooled probes show some overlap. 
Sometimes the segments are identical (other than at the 
interrogation positions) . However, this need not be the case. 
For example, the segments can tile across a reference sequence 

10 in increments of one nucleotide (i.e., one pooled probe 

differs from the next by the acquisition of one nucleotide at 
the 5* end and loss of a nucleotide at the 3' end). The three 
interrogation positions may or may not occur at the same 
relative positions within each pooled probe (i.e., spacing 

15 from a probe terminus) . All that is required is that one of 
the three interrogation positions from each of the three 
pooled probes aligns with the same nucleotide in the reference 
sequence, and that this interrogation position is occupied by 
a different pooled nucleotide in each of the three probes. In 

20 one of the three probes, the interrogation position is 
occupied by an N. In the other two pooled probes the 
interrogation position is occupied by one of (M/K) or (R/Y) or 
(W/S). 

In the simplest form of the trellis strategy, three 
'25 pooled probes are used to analyze a single nucleotide in the 
reference sequence. Much greater economy of probes is 
achieved when more pooled probes are included in an array. 
For example, consider an array of five pooled probes each 
having the general structure outlined above. Three of these 
30 pooled probes have an interrogation position that aligns with 
the same nucleotide in the reference sequence and are used to 
read that nucleotide. A different combination of three probes 
have an interrogation position that aligns with a different 
nucleotide in the reference sequence. Comparison of these 
35 three probe intensities allows analysis of this second 

nucleotide. Still another combination of three pooled probes 
from the set of five have an interrogation position that 
aligns with a third nucleotide in the reference sequence and 



10 
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these probes are used to analyze that nucleotide. Thus, three 
nucleotides in the reference sequence are fully analyzed from 
only five pooled probes. By comparison, the basic tiling 
strategy would require 12 probes for a similar analysis. 

As an example, a pooled probe for analysis of a target 
sequence by the trellis strategy is shown below: 

Target : ATTAACCACTCACGGGAGCTCT 
Pool: TGGTGNKYGCCCT 

The pooled probe actually comprises 16 individual probes: 



TGGTGAGcGCCCT 
+TGGTGcGcGCCCT 

15 +TGGTGgGcGCCCT 
+TGGTGtGcGCCCT 
+TGGTGAtcGCCCT 
+TGGTGctcGCCCT 
+TGGTGgtcGCCCT 

20 +TGGTGttcGCCCT 
+TGGTGAGTGCCCT 
+TGGTGCGTGCCCT 
+TGGTGgGTGCCCT 
+TGGTGtGTGCCCT 

25 +TGGTGAtTGCCCT 
+TGGTG ctTGCCCT 
+TGGTGgtTGCCCT 
+TGGTGttTGCCCT 



30 



The trellis strategy employs an array of probes having at 
least three cells, each of which is occupied by a pooled probe 
as described above* 

Consider the use of three such pooled probes for 

3 5 analyzing a target sequence, of which one position may contain 
any single base substitution to the reference sequence (i.e, 
there are four possible target sequences to be distinguished) . 
Three cells are occupied by pooled probes having a pooled 
interrogation position corresponding to the position of 

40 possible substitution in the target sequence, one cell with an 
•N f , one cell with one of 1 M 1 or 1 K 1 , and one cell with 'R 1 or 
•Y ? . An interrogation position corresponds to a nucleotide in 
the target sequence if it aligns adjacent with that nucleotide 
when the probe and target sequence are aligned to maximize 

45 complementarity. Note that although each of the pooled 
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probes has two other pooled positions, these positions are not 
relevant for the present illustration. The positions are only 
relevant when more than one position in the target sequence is 
to be read, a circumstance that will be considered later. For 
present purposes, the cell with the 'N 1 in the interrogation 
position lights up for the wildtype sequence and any of the 
three single base substitutions of the target sequence. The 
cell with M/K in the interrogation position lights up for the 
wildtype sequence and one of the single-base substitutions. 
The cell with R/Y in the interrogation position lights up for 
the wildtype sequence and a second of the single-base 
substitutions. Thus, the four possible target sequences 
hybridize to the three pools of probes in four distinct 
patterns, and the four possible target sequences can be 
distinguished. 

To illustrate further, consider four possible target 
sequences (differing at a single position) and a pooled probe 
having three pooled positions, N, K and Y with the Y position 
as the interrogation position (i.e., aligned with the variable 
position in the target sequence) : 
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Target 

Wild: ATTAACCACTCACGGGAGCTCT (w) 

Mutants: ATTAACCACTCcCGGGAGCTCT (c) 
Mutants : ATTAACCACTCgCGGGAGCTCT ( g ) 
Mutants : ATTAACCACTC tCGGGAGCTCT ( t ) 

TGGTGNKYGCCCT (pooled probe) . 

The sixteen individual component probes of the pooled probe 
hybridize to the four possible target sequences as follows: 

TARGET 





w 


c 


g 


t 


TGGTGAGcGCCCT 


n 


n 


y 


n 


TGGTGcGcGCCCT 


n 


n 


n 


n 


TGGTGgGcGCCCT 


n 


n 


n 


n 


TGGTGtGcGCCCT 


n 


n 


n 


n 


TGGTGAtcGCCCT 


n 


n 


n 


n 


TGGTGctcGCCCT 


n 


n 


n 


n 


TGGTGgtcGCCCT 


n 


n 


n 


n 


TGGTGttcGCCCT 


n 


n 


n 


n 


TGGTGAGTGCCCT 


y 


n 


n 


n 


TGGTGcGTGCCCT 


n 


n 


n 


n 


TGGTGgGTGCCCT 


n 


n 


n 


n 


TGGTGtGTGCCCT 


n 


n 


n 


n 


TGGTGAtTGCCCT 


n 


n 


n 


n 


TGGTGctTGCCCT 


n 


n 


n 


n 


TGGTGgtTGCCCT 


n 


n 


n 


n 


TGGTGttTGCCCT 


n 


n 


n 


n 



The pooled probe hybridizes according to the aggregate of its 
components : 

Pool: TGGTGNKYGCCCT y n y n 

Thus, as stated above , it can be seen that a pooled probe 
having a y at the interrogation position hybridizes to the 
wildtype target and one of the mutants. Similar tables can be 
drawn to illustrate the hybridization patterns of probe pools 
having other pooled nucleotides at the interrogation position. 

The above strategy of using pooled probes to analyze a 
single base in a target sequence can readily be extended to 
analyze any number of bases. At this point, the purpose of 
including three pooled positions within each probe will become 
apparent. In the example that follows, ten pools of probes, 
each containing three pooled probe positions, can be used to 
analyze a each of a contiguous sequence of eight nucleotides 
in a target sequence. 
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10 



15 



20 



25 



ATTAACCACTCACGGGAGCTCT Reference sequence 
Readable nucleotides 

Pools : 

4 TAATTNKYGAGTG 

5 AATTGNKRAGTGC 

6 ATTGGNKRGTGCC 

7 TTGGTNMRTGCCC 

8 TGGTGNKYGCCCT 

9 GGTGANKRCCCTC 

10 GTGAGNKYCCTCG 

11 TGAGTNMYCTCGA 

12 GAGTGNMYTCGAG 

13 AGTGCNMYCGAGA 



In this example , the different pooled probes tile across 
the reference sequence, each pooled probe differing from the 
next by increments of one nucleotide. For each of the 
readable nucleotides in the reference sequence, there are 
three probe pools having a pooled interrogation position 
aligned with the readable nucleotide. For example, the 12th 
nucleotide from the left in the reference sequence is aligned 
with pooled interrogation positions in pooled probes 8, 9, and 
10. Comparison of the hybridization intensities of these 
pooled probes reveals the identity of the nucleotide occupying 
position 12 in a target sequence. 



30 



35 









Pools 






Targets 


8 


9 


10 


Wild: 


ATTAACCACTCACGGGAGCTCT 


Y 


Y 


Y 


Mutants : 


ATTAACCACTCcCGGGAGCTCT 


N 


Y 


Y 


Mutants : 


ATTAACCACTCgCGGGAGCTCT 


Y 


N 


Y 


Mutants : 


ATTAACCACTCtCGGGAGCTCT 


N 


N 


Y 



Example Intensities: 



40 

I 




■ lit cell 


Wild 












= blank cell 


•C 














1 G 1 














i <p i 
















None 
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Thus, for example, if pools 8, 9 and 10 all light up, one 
knows the target sequence is wildtype, If pools, 9 and 10 
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light up, the target sequence has a C mutant at position 12. 
If pools 8 and 10 light up, the target sequence has a G mutant 
at position 12. If only pool 10 lights up, the target 
sequence has a t mutant at position 12. 
5 The identity of other nucleotides in the target sequence 

is determined by a comparison of other sets of three pooled 
probes. For example, the identity of the 13th nucleotide in 
the target sequence is determined by comparing the 
hybridization patterns of the probe pools designated 9, 10 and 

10 11. Similarly, the identity of the 14th nucleotide in the 

target sequence is determined by comparing the hybridization 
patterns of the probe pools designated 10, 11, and 12. 

In the above example, successive probes tile across the 
reference sequence in increments of one nucleotide, and each 

15 probe has three interrogation positions occupying the same 

positions in each probe relative to the terminus of the probe 
(i.e., the 7, 8 and 9th positions relative to the 3' 
terminus) . However, the trellis strategy does not require 
that probes tile in increments of one or that the 

20 interrogation position positions occur in the same position in 
each probe. In a variant of trellis tiling referred to as 
"loop" tiling, a nucleotide of interest in a target sequence 
is read by comparison of pooled probes, which each have a 
pooled interrogation position corresponding to the nucleotide 

25 of interest, but in which the spacing of the interrogation 
position in the probe differs from probe to probe. 
Analogously to the block tiling approach, this allows several 
nucleotides to be read from a target sequence from a 
collection of probes that are identical except at the 

30 interrogation position. The identity in sequence of probes, 
particularly at their 3 1 termini, simplifies synthesis of the 
array and result in more uniform probe density per cell. 

To illustrate the loop strategy, consider a reference 
sequence of which the 4, 5, 6, 7 and 8th nucleotides (from the 

35 3* termini are to be read. All of the four possible 

nucleotides at each of these positions can be read from 
comparison of hybridization intensities of five pooled probes. 
Note that the pooled positions in the probes are different 
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(for example in probe 55, the pooled positions are 4, 5 and 6 
and in probe 56, 5, 6 and 7). 

TAACCACTCACGGGAGCA Reference sequence 

55 ATTNKYGAGTGCC 

56 ATTGNKRAGTGCC 

57 ATTGGNKRGTGCC 

58 ATTRGTNMGTGCC 
5 9 ATTKRTGNGTGCC 

Each position of interest in the reference sequence is read by 
comparing hybridization intensities for the three probe pools 
that have an interrogation position aligned with the 
nucleotide of interest in the reference sequence . For 
example, to read the fourth nucleotide in the reference 
sequence, probes 55', 58 and 59 provide pools at the fourth 
position. Similarly, to read the fifth nucleotide in the 
reference sequence, probes 55, 56 and 59 provide pools at the 
fifth position. As in the previous trellis strategy, one of 
the three probes being compared has an N at the pooled 
position and the other two have M or K, and (2) R or Y and (3) 
W or S. 

The hybridization pattern of the five pooled probes to 
target sequences representing each possible nucleotide 
substitution at five positions in the reference sequence is 
shown below. Each possible substitution results in a unique 
hybridization pattern at three pooled probes, and the identity 
of the nucleotide at that position can be deduced from the 
hybridization pattern. 
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Pools 



10 



15 



20 



25 



30 



35 



40 





Targets 


55 


56 


57 


58 


59 


Wild: 


TAACCACTCACGGGAGCA 


Y 


Y 


Y 


Y 


Y 


Mutant: 


TAAgCACTCACGGGAGCA 


y 


N 


N 


N 


N 


Mutant : 


TAAtCACTCACGGGAGCA 


Y 


N 


N 


Y 


N 


Mutant : 


TAAaCACTCACGGGAGCA 


Y 


N 


N 


N 


Y 


Mutant : 


TAACgACTCACGGGAGCA 


N 


Y 


N 


N 


N 


Mutant : 


TAACtACTCACGGGAGCA 


N 


Y 


N 


N 


Y 


Mutant : 


TAACaACTCACGGGAGCA 


Y 


Y 


N 


N 


N 


Mutant : 


TAACCcCTCACGGGAGCA 


N 


Y 


Y 


N 


N 


Mutant : 


TAACCgCTCACGGGAGCA 


Y 


N 


Y 


N 


N 


Mutant : 


TAACCtCTCACGGGAGCA 


N 


N 


Y 


N 


N 


Mutant : 


TAACCAgTCACGGGAGCA 


N 


N 


N 


Y 


N 


Mutant : 


TAACCAtTCACGGGAGCA 


N 


Y 


N 


Y 


N 


Mutant : 


TAACCAaTCACGGGAG CA 


N 


N 


Y 


Y 


N 


Mutant: 


TAACCACaCACGGGAGCA 


N 


N 


N 


N 


Y 


Mutant: 


TAACCACcCACGGGAGCA 


N 


N 


Y 


N 


Y 


Mutant: 


TAACCACgCACGGGAGCA 


N 


N 


N 


Y 


Y 



45 



Many variations on the loop and trellis tilings can be 
created. All that is required is that each position in 
sequence must have a probe with a t N , / a probe containing one 
of R/Y, M/K or W/S, and a probe containing a different pool 
from that set, complementary to the wild type target at that 
position, and at least one probe with no pool at all at that 
position. This combination allows all mutations at that 
position to be uniquely detected and identified. 

A further class of strategies involving pooled probes are 
termed coding strategies. These strategies assign code words 
from some set of numbers to variants of a reference sequence. 
Any number of variants can be coded. The variants can include 
multiple closely spaced substitutions, deletions or 
insertions. The designation letters or other symbols assigned 
to each variant may be any arbitrary set of numbers, in any 
order. For example, a binary code is often used, but codes to 
other bases are entirely feasible. The numbers are often 
assigned such that each variant has a designation having at 
least one digit and at least one nonzero value for that digit. 
For example, in a binary system, a variant assigned the number 
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101, has a designation of three digits, with one possible 
nonzero value for each digit. 

The designation of the variants are coded into an array 
of pooled probes comprising a pooled probe for each nonzero 
% 5 value of each digit in the numbers assigned to the variants. 

For example, if the variants are assigned successive number in 
a numbering system of base m, and the highest number assigned 
to a variant has n digits, the array would have about n x (m- 
1) pooled probes. In general, log m (3N+1) probes are required 

10 to analyze all variants of N locations in a reference 

sequence, each having three possible mutant substitutions. 
For example, 10 base pairs of sequence may be analyzed with 
only 5 pooled probes using a binary coding system. 
Each pooled probe has a segment exactly complementary to the 

15 reference sequence except that certain positions are pooled. 
The segment should be sufficiently long to allow specific 
hybridization of the pooled probe to the reference sequence 
relative to a mutated form of the reference sequence. As in 
other tiling strategies, segments lengths of 9-21 nucleotides 

2 0 are typical. Often the probe has no nucleotides other than 
the 9-21 nucleotide segment. The pooled positions comprise 
nucleotides that allow the pooled probe to hybridize to every 
variant assigned a particular nonzero value in a particular 
digit. Usually, the pooled positions further comprises a 

25 nucleotide that allows the pooled probe to hybridize to the 
reference sequence. Thus, a wildtype target (or reference 
sequence) is immediately recognizable from all the pooled 
probes being lit. 

When a target is hybridized to the pools, only those 

30 pools comprising a component probe having a segment that is 

exactly complementary to the target light up. The identity of 
the target is then decoded from the pattern of hybridizing 
j pools. Each pool that lights up is correlated with a 

particular value in. a particular digit. Thus, the aggregate 

35 hybridization patterns of each lighting pool reveal the value 
of each digit in the code defining the identity of the target 
hybridized to the array. 
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As an example, . consider a reference sequence having four 
positions, each of which can be occupied by three possible 
mutations. Thus, in total there are 4x3 possible variant 
forms of the reference sequence. Each variant is assigned a 
binary number binary numbers 0001-1100 and the wildtype 
reference sequence is assigned the binary number 1111. 



Positions 

Target: TAAC C=llll 
CACGGGAGCA 

G=0001 
T=0101 
A=1001 



X 

A=llll 

C=0010 
G=0110 
T=1010 



X 

C=llll 

G=0011 
T-0111 
A=1011 



X 

T=llll 

A=0100 
C=1000 
G=1100 



A first pooled probe is designed by including probes that 
complement exactly each variant having a 1 in the first digit. 



20 



25 



30 



target 
Mutant 
Mutant 
Mutant 
Mutant 
Mutant 
Mutant 



(HID 
(0001) 
(0101) 
(1001) 
(0011) 
(0111) 
(1101) 



TAAC 
TAAC 
TAAC 
TAAC 
TAAC 
TAAC 
TAAC 



First pooled probe 
ATTG 
ATTG 



C 
C 
C 



[GCAT] 
N 



A C 

A C 

A C 

A C 

A c 
A 
A 



T [GCAT] 
T N 



T 
T 
T 
T 
T 
T 
T 



CACGGGAGCA 
CACGGGAGCA 
CACGGGAGCA 
CACGGGAGCA 
CACGGGAGCA 
CACGGGAGCA 
CACGGGAGCA 



A GTGCCC 
A GTGCCC 



35 



40 



Second , third and fourth pooled probes are then designed 
respectively including component probes that hybridize to each 
variant having a. 1 in the second, third and fourth digit. 

XXXX - 4 positions examined 



Target: 
Pool 1(1) : 
Pool 2 (2) : 
Pool 3 (4) : 
Pool 4 (8) : 



TAACCACTCACGGGAG C A 
ATTG nTnAGTG C C C = 
ATTGGnnAGTGCCC = 
ATTGyrydGTGCCC = 
ATTGmwmbGTGCCC = 



16 probes 

16 probes 

2 4 probes 

24 probes 



(4xlx4xl) 
(1x4x4x1) 
(2x2x2x3) 
(2x2x2x3) 
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The pooled probes hybridize to variant targets as follows: 
Hybridization pattern: 









Pools 






Targets 


1 


2 


3 


4 


wild ( 1111 ) 


TAACCACTCACGGGAGCA 


Y 


Y 


Y 


Y 


Mutant (0001) : 


TAACgACTCACGGGAGCA 


Y 


N 


N 


N 


Mutant(OlOl) : 


TAACtACTCACGGGAGCA 


Y 


XT 

rJ 


v 

X 


N 


Mutant ( 1 001 \ • 


TA A Pa APTPAPf^HACPA 


v 

X 


vr 


XT 


Y 


Mutant (0010) : 


TAACCcCTCACGGGAGCA 


N 


Y 


N 


N 


Mutant (0110) : 


TAACCgCTCACGGGAGCA 


N 


Y 


Y 


N 


Mutant (1010) : 


TAACCtCTCACGGGAGCA 


N 


Y 


N 


Y 


Mutant (0011) : 


TAACCAgTCACGGGAGCA 


Y 


Y 


N 


N 


Mutant (0111) : 


TAACCAtTCACGGGAGCA 


Y 


Y 


Y 


N 


Mutant (1101) : 


TAACCAaTCACGGGAGCA 


Y 


N 


Y 


Y 


Mutant (0100) : 


TAACCACaCACGGGAGCA 


N 


N 


Y 


N 


Mutant (1000) : 


. TAACCACcCACGGGAGCA 


N 


N 


N 


Y 


Mutant (1100) : 


TAACCACgCACGGGAGCA 


N 


N 


Y 


Y 



The identity of a variant {i.e., mutant) target is read 
directly from the hybridization pattern of the pooled probes. 
For example the mutant assigned the number 0001 gives a 
hybridization pattern of NNNY with respect to probes 4,3, 2 
and 1 respectively. 

In the above example, variants are assigned successive 
numbers in a numbering system. In other embodiments, sets of 
numbers can be chosen for their properties. If the codewords 
are chosen from an error-control code, the properties of that 
code carry over to sequence analysis. An error code is a 
numbering system in which some designations are assigned to 
variants and other designations serve to indicate errors that 
may have occurred in the hybridization process. For example, 
if all codewords have an odd number of nonzero digits ('binary 
coding+error detection 1 ), any single error in hybridization 
will be detected by having an even number of pools lit. 

Wild 

Target : TAACCACTCACGGGAGCA 

Pool 1(1): ATTGnAnAGTGCCC = 16 Probes (4x1x4x1) 

Pool 2(2): ATTGGnnAGTGCCC = 16 Probes (1X4X4X1) 

Pool 3(4): ATTGryrhGTGCCC = 24 Probes (2X2X2X3) 

Pool 4(8): ATTGkwkvGTGCCC = 24 Probes (2X2X2X3) 
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A fifth probe can be added to make the number of pools that 
hybridize to any single mutation odd. 



Pool 5(c): ATTGdhsmGTGCCC = 36 probes (2x2x3x3) 

5 

Hybridization of pooled probes to targets 













FOOl 










Target 


1 


-> 

2 




4 




10 


Target ( mil ) . 




Y 


Y 


Y 


Y 


Y 


Mutant(OOOOl) : 


TAACgACTCACGGGAGCA 


Y 


N 


N 


N 


N 




Mutant (10101) : 


TAACt ACTCACGGG AG CA 


Y 


N 


N 


N 


N 




Mutant(llOOl) : 


TAACaACTCACGGGAGCA 


Y 


N 


N 


Y 


Y 


15 


Mutant (00010) : 


TAACCcCTCACGGGAGCA 


N 


Y 


N 


N 


N 




Mutant (10110) : 


TAACCgCTCACGGGAGCA 


N 


Y 


Y 


N 


Y 




Mutant (11010) : 


TAACCtCTCACGGGAGCA 


N 


Y 


N 


Y 


Y 




Mutant (10011) : 


TAACCAgTCACGGGAGCA 


Y 


Y 


N 


N 


Y 


20 


Mutant (00111) : 


TAACAtTCACGGGAGCA 


Y 


Y 


Y 


N 


N 




Mutant (01101) : 


TAACCAaTCACGGGAGCA 


Y 


N 


Y 


Y 


N 




Mutant (00100) : 


T AACCACa C ACGGG AG C A 


N 


N 


Y 


N 


N 




Mutant (01000) : 


TAACCACCCACGGGAGCA 


N 


N 


N 


Y 


N 


25 


Mutant (11100) : 


TAACCACgCACGGGAGCA 


N 


N 


Y 


Y 


Y 



9> Bridging Strategy 

Probes that contain partial matches to two separate 

30 (i-e w non contiguous) subsequences of a target sequence 
sometimes hybridize strongly to the target sequence. In 
certain instances, such probes have generated stronger signals 
than probes of the same length which are perfect matches to 
the target sequence. It is believed (but not necessary to the 

35 invention) that this observation results from interactions of 
a single target sequence with two or more probes 
simultaneously. This invention exploits this observation to 
provide arrays of probes having at least first and second 
segments, which are respectively complementary to first and 

40 second subsequences of a reference sequence. Optionally, the 
probes may have a third or more complementary segments. These 
probes can be employed in any of the strategies noted above. 
The two segments of such a probe can be complementary to 
disjoint subsequences of the reference sequences or contiguous 

4 5 subsequences. If the latter, the two segments in the probe 
are inverted relative to the order of the complement of the 
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reference sequence. The two subsequences of the reference 
sequence each typically comprises about 3 to 3 0 contiguous 
nucleotides. The subsequences of the reference sequence are 
sometimes separated by 0, 1, 2 or 3 bases. Often the 
sequences, are adjacent and nonover lapping. 

For example , a wild-type probe is created by 
complementing two sections of a reference sequence (indicated 
by subscript and superscript) and reversing their order. The 
interrogation position is designated (*) and is apparent from 
comparison of the structure of the wildtype probe with the 
three mutant probes. The corresponding nucleotide in the 
reference sequence is the "a" in the superscripted segment. 

Reference: 5* T GGCTA CGAGG AATCATCTGTTA 

* 

Probes: 3' GCTCC CCGAT (Probe from first probe set) 

3 1 GCACC CCGAT 
3 1 GCCCC CCGAT 
3 1 GCGCC CCGAT 

The expected hybridizations are: 

Match: 

GCTCCCCGAT 

TGGCTACGAGGAATCATCTGTTA 

GCTCC CCGAT 

Mismatch: 

GCTCC CCGAT 

. . . TGGCTACGAGGAATCATCTGTTA 
. GCGCC CCGAT 

Bridge tilings are specified using a notation which gives 
the length of the two constituent segments and the relative 
position of the interrogation position. The designation n/m 
indicates a segment complementary to a region of the reference 
sequence which extends for n bases and is located such that 
the interrogation position is in the mth base from the 5 f end. 
If m is larger than n, this indicates that the entire segment 
is to the 5' side of the interrogation position. If m is 
negative , it indicates that the interrogation position is the 
absolute value of m bases 5 1 of the first base of the segment 
(m cannot be zero). Probes comprising multiple segments, such 
as n/m + a/b + ... have a first segment at the 3' end of the 
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probe and additional segments added 5 1 with respect to the 
first segment. For example, a 4/8 tiling consists of (from 
the 3' end of the probe) a 4 base complementary segment, 
starting 7 bases 5* of the interrogation position, followed by 
5 a 6 base region in which the interrogation position is located 
at the third base. Between these two segments, one base from 
the reference sequence is omitted. By this notation, the set 
shown above is a 5/3 + 5/8 tiling. Many different tilings are 
possible with this method, since the lengths of both segments 

10 can be varied, as well as their relative position (they may be 
in either order and there may be a gap between them) and their 
location relative to the interrogation position. 

As an example, a 16 mer oligo target was hybridized to a 
chip containing all 4 10 probes of length 10. The chip 

15 includes short tilings of both standard and bridging types. 

The data from a standard 10/5 tiling was compared to data from 
a 5/3 + 5/8 bridge tiling (see Table 1) . Probe intensities 
(mean count/pixel) are displayed along with discrimination 
ratios (correct probe intensity / highest incorrect probe 

20 intensity). Missing intensity values are less than 50 counts. 
Note that for each base displayed the bridge tiling has a 
higher discrimination value. 



TABLE 1: Comparison of Standard and Bridge Tilings 

25 



TILING 


PROBE BASE: 


CORRECT 


PROBE 


BASE 






c 


A 


c 


C 




A 


92 


496 


294 


299 


STANDARD 


C 


536 


148 


532 


534 


(10/5) 


G 


69 


167 


72 


52 


T 


146 


95 


212 


126 


DISCRIMINATION: 




3.7 


3.0 


1.8 


1.8 




A 




404 




156 


BRIDGING 


C 


276 




345 


379 


5/3 + 5/8 


G 




80 






T 








58 


DISCRIMINATION: 




>5.5 


5.1 


2.4 


1.26 



The bridging strategy offers the following advantages: 
(1) Higher discrimination between matched and mismatched 
probes, 
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(2) The possibility of using longer probes in a bridging 
tiling, thereby increasing the specificity of the 
hybridization, without sacrificing discrimination, 

(3) The use of probes in which an interrogation position 
5 is located very off-center relative to the regions of target 

complementarity. This may be of particular advantage when, 
for example, when a probe centered about one region of the 
target gives low hybridization signal. The low signal is 
overcome by using a probe centered about an adjoining region 
10 giving a higher hybridization signal. 

(4) Disruption of secondary structure that might result 
in annealing of certain probes (see previous discussion of 
helper mutations) . 



15 10^ Deletion Tiling 

Deletion tiling is related to both the bridging and 
helper mutant strategies described above. In the deletion 
strategy, comparisons are performed between probes sharing a 
common deletion but differing from each other at an 

20 interrogation position located outside the deletion. For 

example, a first probe comprises first and second segments, 
each exactly complementary to respective first and second 
subsequences of a reference sequence, wherein the first and 
second subsequences of the reference sequence are separated by 

25 a short distance (e.g., 1 or 2 nucleotides). The order of the 
first and second segments in the probe is usually the same as 
that of the complement to the first and second subsequences in 
the reference sequence. The interrogation position is usually 
separated from The comparison is performed with three other 

30 probes, which are identical to the first probe except at an 
interrogation position, which is different in each probe. 
Reference:. . . AGTACCAGATCTCTAA . . . 

Probe set: CATGGNC AGAGA (N = interrogation position) . 

Such tilings sometimes offer superior discrimination in 
35 hybridization intensities between the probe having an 

interrogation position complementary to the target and other 
probes. Thermodynamically , the difference between the 
hybridizations to matched and mismatched targets for the probe 
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set shown above is the difference between a single-base bulge, 
and a large asymmetric loop (e.g., two bases of target, one of 
probe) . This often results in a larger difference in 
stability than the comparison of a perfectly matched probe 
5 with a probe showing a single base mismatch in the basic 
tiling strategy. 

The superior discrimination offered by deletion tiling is 
illustrated by Table 2, which compares hybridization data from 
a standard 10/5 tiling with a (4/8 + 6/3) deletion tiling of 

10 the reference sequence. (The numerators indicate the length 
of the segments and the denominators, the spacing of the 
deletion from the far termini of the segments.) Probe 
intensities (mean count/pixel) are displayed along with 
discrimination ratios (correct probe intensity / highest 

15 incorrect probe intensity) . Note that for each base displayed 
the deletion tiling has a higher discrimination value than 
either standard tiling shown. 

TABLE 2. Comparison of Standard and Deletion Tilings 



20 





TILING 


PROBE BASE: 


CORRECT 


PROBE 


BASE 








C 


A 


c 


C 


25 




A 


92 


496 


294 


299 




STANDARD 


c 


536 


148 


532 


534 




(10/5) 


G 


69 


167 


72 


52 




T 


146 


95 


212 


126 


30 


DISCRIMINATION: 




3.7 


3.0 


1.8 


1.8 






A 


6 


412 


29 


48 




DELETION 


c 


297 


32 


465 


160 




4/8 + 6/3 


G 


8 


77 


10 


4 


35 




T 


8 


26 


31 


5 




DISCRIMINATION: 




37.1 


5.4 


15 


3.3 






A 


347 


533 


228 


277 


40 


STANDARD 


C 


729 


194 


536 


496 




(10/7) 


G 


232 


231 


102 


89 






T 


344 


133 


163 


150 




DISCRIMINATION: 




2.1 


2.3 


2.3 


1.8 



45 



The use of deletion or bridging probes is quite general. 
These probes can be used in any of the tiling strategies of 
the invention. As well as offering superior discrimination, 
50 the use of deletion or bridging strategies is advantageous for 
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certain probes to avoid self-hybridization (either within a 
probe or between two probes of the same sequence) 

C. Preparation of Target Samples 
5 The target polynucleotide, whose sequence is to be 

determined, is usually isolated from a tissue sample. If the 
target is genomic, the sample may be. from any tissue (except 
exclusively red blood cells) . For example, whole blood, 
peripheral blood lymphocytes or PBMC, skin, hair or semen are 

10 convenient sources of clinical samples. These sources are 
also suitable if the target is RNA. Blood and other body 
fluids are also a convenient source for isolating viral 
nucleic acids. If the target is mRNA, the sample is obtained 
from. a tissue in which the mRNA is expressed. If the 

15 polynucleotide in the sample is RNA, it is usually reverse 
transcribed to DNA. DNA samples or, cDNA resulting from 
reverse transcription are usually amplified, e.g., by PCR. 
Depending on the selection of primers and amplifying 
enzyme (s), the amplification product can be RNA or DNA. 

20 Paired primers are selected to flank the borders of a target 
polynucleotide of interest. More than one target can be 
simultaneously amplified by multiplex PCR in which multiple 
paired primers are employed. The target can be labelled at 
one or more nucleotides during or after amplification. For 

25 some target polynucleotides (depending on size of sample) , 
e.g., episomal DNA, sufficient DNA is present in the tissue 
sample to dispense with the amplification step. 

When the target strand is prepared in single-stranded 
form as in preparation of target RNA, the sense of the strand 

30 should of course be complementary to that of the probes on the 
chip. This is achieved by appropriate selection of primers. 
The target is preferably fragmented before application to the 
chip to reduce or eliminate the formation of secondary 
structures in the target. The average size of targets 

3 5 segments following hybridization is usually larger than the 
size of probe on the chip. 
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II". ILLUSTRATIVE CHIPS 
A. HIV Chip 

HIV has infected a large and expanding number of people, 
resulting in massive health care expenditures. HIV can 
5 rapidly become resistant to drugs used to treat the infection, 
primarily due to the action of the heterodimeric protein (51 
kDa and 66 kDa) HIV reverse transcriptase (RT) both subunits 
of which are encoded by the 1.7 kb pol gene . The high error 
rate (5-10 per round) of the RT protein is believed to account 

10 for the hypermutability of HIV. The nucleoside analogues, 
i.e., AZT, ddl, ddC, and d4T, commonly used to treat HIV 
infection are converted to nucleotide analogues by sequential 
phosphorylation in the cytoplasm of infected cells, where 
incorporation of the analogue into the viral DNA results in 

15 termination of viral replication, because the 5' -> 3 1 

phosphodiester linkage cannot be completed. However, after 
about 6 months to 1 year of treatment or less, HIV typically 
mutates the RT gene so as to become incapable of incorporating 
the analogue and so resistant to treatment. Several mutations 

2 0 known to be associated with drug resistance are shown in the 
table below. After a virus having drug resistance via a 
mutation becomes predominant, the patient suffers dramatically 
increased viral load, worsening symptoms (typically more 
frequent and dif f icult-to-treat infections) , and ultimately 

25 death. Switching to a different treatment regimen as soon as 
a resistant mutant virus takes hold may be an important step 
in patient management which prolongs patient life and reduces 
morbidity during life. 
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TABLE 3 

SOME RT MUTATIONS ASSOCIATED WITH DRUG RESISTANCE 



25 



ANTIVIRAL 


CODON 


aa CHANGE 


nt CHANGE 


AZT 


67 


Asp — > Asn 


GAC -> AAC 


AZT 


. 7 u 


Lys — > Arg 


AAA -> AGA 


AZT 


215 


Thr -> Phe or Tyr 


ACC -> TTC or TAC 


AZT 


219 


Lys -> Gin or Glu 


AAA -> CAA or GAA 


AZT 


41 


Met -> Leu 


ATG -> TTG or CTG 


ddl and ddC 


184 


Met ~> Val 


ATG -> GTG 


ddl and ddC 


74 


Leu -> Val 




TIBO 82150 


100 


Leu -> lie 




ddC 


65 


Lys -> Asn 


AAA -> AGA 


ddC 


69 


Thr -> Asp 


ACT -> GAT 


3TC 


184 


Met -> Val 


ATG -> GTG or GTA 


3TC 


184 


Met -> lie 


ATG -> ATA 


AZT + ddl 


62 


Ala -> Val 


GCC -> GTC 


AZT + ddl 


75 


Val -> lie 


GTA -> ATA 


AZT + ddl 


77 


Phe -> Leu 


TTC -> TTA 


AZT + ddl 


116 


Phe -> Tyn 


TTT -> TAT 


ALX T UQl 


1 CI 

JLD X 


tjj.n — .> met 


CAG — > ATG 


Nevaripine 


103 


Lys -> Asn 


AAA -> AAT 




106 


Val -> Ala 


GTA -> GCA 




108 








181 


Tyr -> Cys 


TAT -> TGT 




188 


Tyr -> His 


TAT -> CAT 




190 


Gly -> Ala 


GGA -> GCA 



3 0 N . B . '. Other mutations confer resistance to other drugs. 

A second important therapeutic target for anti-HIV drugs 
is the aspartyl protease enzyme encoded by the HIV genome, 
35 whose function is required for the formation of infectious 
progeny. See Robbins & Plattner, J. Acquired Immune 
Deficiency Syndromes 6, 162-170 (1993); Kozal et al., Curr. 
Op. Infect. Dis. 7:72-81 (1994). The protease function in 
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processing of viral precursor polypeptides to their active 
forms. Drugs targeted against this enzyme do not impair 
endogenous human proteases, thereby achieving a high degree of 
selective toxicity. Moreover, the protease is expressed later 
5 in the life-cycle that reverse transcriptase, thereby offering 
the possibility of a combined attack on HIV at two different 
times in its life-cycle. As for drugs targeted against the 
reverse transcriptase, administration of drugs to the protease 
can result in acquisition of drug resistance through mutation 

10 of the protease. By monitoring the protease gene from 
patients, it is possible to detect the occurrence of 
mutations, and thereby make appropriate adjustments in the 
drug(s) being administered. 

In addition to being infected with HIV, AIDS patients are 

15 often also infected with a wide variety of other infectious 
agents giving rise to a complex series of symptoms. Often 
diagnosis and treatment is difficult because many different 
pathogens (some life-threatening, others routine) cause 
similar symptoms. Some of these infections, so-called 

20 opportunistic infections, are caused by bacterial, fungal, 
protozoan or viral pathogens which are normally present in 
small quantity in the body, but are held in check by the 
immune system. When the immune system in AIDS patients fails, 
these normally latent pathogens can grow and generate rampant 

25 infection. In treating such patients, it would be desirable 
simultaneously to diagnose the presence or absence of a 
variety of the most lethal common infections, determine the 
most effective therapeutic regime against the HIV virus, and 
monitor the overall status of the patient's infection. 

30 The present invention provides DNA chips for detecting 

the multiple mutations in HIV genes associated with resistance 
to different therapeutics. These DNA chips allow physicians 
to monitor mutations over time and to change therapeutics if 
resistance develops. Some chips also provide probes for 

35 diagnosis of pathogenic microorganisms that typically occur in 
AIDS patients. 

The sequence selected as a reference sequence can be from 
anywhere in the HIV genome, but should preferably cover a 
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region of the HIV genome in which mutations associated with 
drug resistance are known to occur. A reference sequence is 
usually between about 5, 10, 20, 50, 100, 5000, 1000, 5,000 or 
10,000 bases in length, and preferably is about 100-1700 bases 
5 in length. Some reference sequences encompass at least part 
of the reverse transcriptase sequence encoded by the pol gene. 
Preferably, the reference sequence encompasses all, or 
substantially all (i.e, about 75 or 90%) of the reverse 
transcriptase gene. Reverse transcriptase is the target of 

10 several drugs and as noted, above, the coding sequence is the 
site of many mutations associated with drug resistance. In 
some chips, the reference sequence contains the entire region 
coding reverse transcriptase (850 bp), and in other chips, 
subfragments thereof. In some chips, the reference sequence 

15 includes other subfragments of the pol gene encoding HIV 
protease or endonuclease, instead of, or as well as the 
segment encoding reverse transcriptase. In some chips, the 
reference sequence also includes other HIV genes such as env 
or gag as well as or instead of the reverse transcriptase 

20 gene. Certain regions of the gag and env genes are relatively 
well conserved, and their detection provides a means for 
identifying and quantifying the amount of HIV virus infecting 
a patient. In some chips, the reference sequence comprises an 
entire HIV genome. 

25 It is not critical from which strain of HIV the reference 

sequence is obtained. HIV strains are classified as HIV-I, 
HIV-II or HIV-III, and within these generic groupings there 
are several strains and polymorphic variants of each of these. 
BRU, SF2, HXB2, HXB2R are examples of HIV-1 strains, the 

30 sequences of which are available from GenBank. The reverse 
transcriptase genes of the BRU and SF2 strains differ at 23 
nucleotides. The HXB2 and HXB2R strains have the same reverse 
transcriptase gene sequence, which differs from that of the 
BRU strain at four nucleotides, and that of SF2 by 27 

35 nucleotides. In some chips, the reference sequence 

corresponds exactly to the reverse transcriptase sequence in 
the wildtype version of a strain. In other chips, the 
reference sequence corresponds to a consensus sequence of 
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several HIV strains. In some chips, the reference sequence 
corresponds to a mutant form of a HIV strain. 

Chips are designed in accordance with the tiling 
strategies noted above . The probes are designed to be 
5 complementary to either the coding or noncoding strand of the 
HIV reference sequence. If only one strand is to be read, it 
is preferable to read the coding strand. The greater 
percentage of A residues in this strand relative to the 
noncoding strand generally result in fewer regions of 

10 ambiguous sequence. 

Some chips contain additional probes or groups of probes 
designed to be complementary to a second reference sequence. 
The second reference sequence is often a subsequence of the 
first reference sequence bearing one or more commonly 

15 occurring HIV mutations or interstrain variations (e.g., 

within codons 67, 70, 215 or 219 of the reverse transcriptase 
gene) . The inclusion of a second group is particularly useful 
for analyzing short subsequences of the primary reference 
sequence in which multiple mutations are expected to occur 

2 0 within a short distance commensurate with the length of the 
probes (i.e., two or more mutations within 9 to 21 bases). 

The total number of probes on the chips depends on the 
tiling strategy, the length of the reference sequence and the 
options selected with respect to inclusion of multiple probe 

25 lengths and secondary groups of probes to provide confirmation 
of the existence of common mutations. To read much or all of 
the HIV reverse transcriptase gene (857 b for the BRU strain) , 
chips tiled by the basic strategy typically contain at least 
857 x 4 = 3428 probes. 

30 The target HIV polynucleotide, whose sequence is to be 

determined, is usually isolated from blood samples (peripheral 
blood lymphocytes or PBMC) in the form of RNA. The RNA is 
reverse transcribed to DNA, and the DNA product is then 
amplified. Depending on the selection of primers and 

35 amplifying enzyme, the amplification product can be RNA or 

DNA. Suitable primers for amplification of target are shown 
in the table below. 



WO 95/11995 



PCT/US94/12305 



67 

TABLE 4 
AMPLIFICATION OF TARGET 



TARGET 
SIZE 


FORWARD PRIMER 


REVERSE PRIMER 


1, Op 


rrr AG A A TTrTGTTG A CTC A G ATTG G 


GATAAGCTTGGGCCTTATCTATTCCAT 


535 bp 


AAATCC ATAC AATACTCCAGTATTTG C 


ACCCATCCAAAGGAATGGAGGTTCTTTC 


323 bp 


Gcnbank » K02013 1889-1908 


bases 2211-2192 




AATTAACCCTCACTAAAGGGAga 
ggaagaatctgttgactcagattggt (RT01-T3) 


AATTTAATACGACTCACTATAGGGAtttcccca 
ctaacttctgtatgtcattgaca-3 * (89-391 T7) 




AATTAACCCTCACTAAAGGGAga 
agtatactgcattaccatacciagta (RT03-T3) 






TAATACGACTCACTATAGGGAGA 
tcgacgcaggactcggcttgctgaa (HV1-T2) 






AATTAACCCTCACTAAAGGGAGA 
ccttgtaagtcattggtcttaaaggta (HV2-T3) 





15 

In another aspect of the invention, chips are provided 
for simultaneous detection of HIV and microorganisms that 
commonly parasitize AIDS patients (e.g., cytomegalovirus 
(CMV) , Pneumocystis carini (PCP) , fungi (Candida albicans) , 

20 mycobacteria) . Non-HIV viral pathogens are detected and their 
drug resistance determined using a similar strategy as for 
HIV. That is groups of probes are designed to show 
complementarity to a target sequence from a region of the 
genome of a nonviral pathogen known to be associated with 

25 acquisition of drug resistance. For example, CMV and HSV 

viruses, which frequently co-parasitize AIDS patients, undergo 
mutations to acquire resistance to acyclovir. 

For detection of non-viral pathogens, the chips include 
an array of probes which allow full-sequence determination of 

30 16S ribosomal RNA or corresponding genomic DNA of the 

pathogens. The additional probes are designed by the same 
principles as described above except that the target sequence 
is a variable region from a 16S RNA (or corresponding DNA) of 
a pathogenic microorganism. Alternatively, the target 

35 sequence can be a consensus sequences of variable 16S rRNA 

regions from multiple organisms. 16S ribosomal DNA and RNA is 
present in all organisms (except viruses) and the sequence of 
the DNA or RNA is closely related to the evolutionary genetic 
distance between any two species. Hence, organisms which are 

40 quite close in type (e.g., all mycobacteria) share a common 
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region of 16S rDNA, and differ in other regions (variable 
regions) of the 16S rRNA. These differences can be exploited 
to allow identification of the different subtype strains. The 
full sequence of 16S ribosomal RNA or DNA read from the chip 
is compared against a database of the sequence of thousands of 
known pathogens to type unambiguously most nonviral pathogens 
infecting AIDS patients. 

In a further embodiment, the invention provides chips 
which also contain probes for detection of bacterial genes 
conferring antibiotic resistance. An antibiotic resistance 
gene can be detected by hybridization to a single probe 
employed in a reverse dot blot format. Alternatively, a group 
of probes can be designed according to the same principles 
discussed above to read all or part the DNA sequence encoding 
an antibiotic resistance gene. Analogous probes groups are 
designed for reading other antibiotic resistance gene 
sequences. Antibiotic resistance frequently resides in one of 
the following genes in microorganisms coparasitizing AIDS 
patients: rpoB (encoding RNA polymerase), katG (encoding 
catalase peroxidase, and DNA gyrase A and B genes. 

The inclusion of probes for combinations of tests on a 
single chip simulates the clinical diagnosis tree that a 
physician would follow based on the presentation of a given 
syndrome which could be caused by any number of possible 
pathogens. Such chips allow identification of the presence 
and titer of HIV in a patient, identification of the HIV 
strain type and drug resistance, identification of 
opportunistic pathogens, and identification of the drug 
resistance of such pathogens. Thus, the physician is 
simultaneously apprised of the full spectrum of pathogens 
infecting the patient and the most effective treatments 
therefor. 
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Exemplary HIV Chips 
(a) HV 273 

The HV 273 chip contains an array of oligonucleotide 
probes for analysis of an 857 base HIV amplicon between 
nucleotides 2090 and 294 6 (HIVBRU strain numbering) . The chip 
contains four groups of probes: 11 mers, 13 mers, 15 mers and 
17 mers. From top to bottom, the HV 273 chip is occupied by 
rows of 11 mers, followed by rows of 13 mers, followed by rows 
of 15 mers followed by rows of 17 mers. The interrogation 
position is nucleotide 6, 7, 8 and 9 respectively in the 
different sized chips. This arrangement of the different 
sized probes is referred to as being "in series." Within each 
size group, there are four probe sets laid down in an A-lane, 
a C-lane a G-lane and a T-lane respectively. Each lane 
contains an overlapping series of probes with one probe for 
each nucleotide in the 2090-2946 HIV reverse transcriptase 
reference sequence, (i.e., 857 probes per lane). The lanes 
also include a few column positions which are empty or 
occupied by control probes. These positions serve to orient 
the chip, determine background fluorescence and punctuate 
different subsequences within the target. The chip has an area 
of 1.28 x 1.28 cm, within which the probes form a 130 X 135 
matrix (17,550 cells total). The area occupied by each probe 
(i.e., a probe cell) is about 98 X 95 microns. 

The chip was tested for its capacity to sequence a 
reverse transcriptase fragment from the HIV strain SF2 . An 
831 bp RNA fragment (designated pPoll9) spanning most of the 
HIV reverse transcriptase coding sequence was amplified by 
PCR, using primers tagged with T3 and T7 promoter sequences. 
The primers, designated RT#1-T3 and 89-391 T7 are shown in 
Table 4; see also Gingeras et al., J. Inf. Dis . 164, 1066-1074 
(1991) (incorporated by reference in its entirety for all 
purposes) . RNA was labelled by incorporation of fluorescent 
nucleotides. The RNA was fragmented by heating and hybridized 
to the chip for 40 min at 30 degrees. Hybridization signals 
were quantified by fluorescence imaging. 

Taking the best data from the four probes sets at each 
position in the target sequence, 715 out of 821 bases were 
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read correctly (87%) . (Comparisons are based on the sequence 
of pPoll9 determined by the conventional dideoxy method to be 
identical to SF2) - In general, the longer sized probes 
yielded more sequence than the shorter probes. Of the 21 
positions at which the SF2 and BRU strains diverged within the 
target, 19 were read correctly. 

Many of the short ambiguous regions in the target arise 
in segments of the target flanking the points at which the SF2 
and BRU sequences diverge. These ambiguities arise because in 
these regions the comparison of hybridization signals is not 
drawn between perfectly matched and single base mismatch 
probes but between a single-mismatched probe and three probes 
having two mismatches. These ambiguities in reading an SF2 
sequence would not detract from the chip's ability to read a 
BRU sequence either alone or in a mixture with an SF2 target 
sequence. 

In a variation of the above procedure, the chip was 
treated with RNase after hybridization of the pPoll9 target to 
the probes. Addition of RNase digests mismatched target and 
thereby increases the signal to noise ratio. RNase treatment 
increased the number of correctly read bases to 743/821 or 90% 
(combining the data from the four groups of probes) . 

In a further variation, the RNA target was replaced with 
a DNA target containing the same segment of the HIV genome. 
The DNA probe was prepared by linear amplification using Taq 
polymerase, RT#1-T3 primer, and fluorescein d-UTP label. The 
DNA probe was fragmented with uracil DNA glycosylase and heat 
treatment. The hybridization pattern across the array and 
percentage of readable sequence were similar to those obtained 
using an RNA target. However, there were a few regions of 
sequence that could be read from the RNA target that could not 
be read from the DNA target and vice versa. 

fb) HV 407 Chip 

The 407 chip was designed according to the same 
principles as the HV 273 chip, but differs in several 
respects. First, the oligonucleotide probes on this chip are 
designed to exhibit perfect sequence identity (with the 
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exception of the interrogation position on each probe) to the 
HIV strain SF2 (rather than the BRU strain as was the case for 
the HV 273 chip) . Second, the 4 07 chip contains 13 mers, 15 
mers, 17 mers and 19 mers (with interrogation positions at 
5 nucleotide .7, 8, 9 and 10 respectively), rather than the 11 

mers, 13 mers, 15 mers and 17 mers on the HV 273 chip/ Third, 
the different sized groups of oligomers are arranged in 
parallel in place of the in-series arrangement on the HV 273 
chip. In the parallel arrangement, the chip contains from top 

10 to bottom a row of 13 mers, a row of 15 mers, a row of 17 

mers, a row of 19 mers, followed by a further row of 13 mers, 
a row of 15 mers, a row of 17 mers, a row of 19 mers, followed 
by a row of 13 mers, and so forth. Each row contains 4 lanes 
of probes, an A lane, a C lane, a G lane and a T lane, as 

15 described above. The probes in each lane tile across the 

reference sequence. The layout of probes on the HV 4 07 chip is 
shown in Fig. 10. 

The 407 chip was separately tested for its ability to 
sequence two targets, pPoll9 RNA and 4MUT18 RNA. pPoll9 

20 contains an 831 bp fragment from the SF2 reverse transcriptase 
gene which exhibits perfect complementarity to the probes on 
the 407 chip (except of course for the interrogation positions 
in three of the probes in each column) . 4MUT18 differs from 
the reference sequence at thirty-one positions within the 

25 target, including five positions in codons 67, 70, 215 and 219 
associated with acquisition of drug resistance. Target RNA 
was prepared, labelled and fragmented as described above and 
hybridized to the HV 407 chip. The hybridization pattern for 
the pPoll9 target is shown in Fig. 11. 

3 0 The sequences read off the chip for the pPoll9 and 4MUT18 

targets are both shown in Fig. 12 (although the two sequences 
were determined in different experiments) . The sequence 
labelled wildtype in the Figure is the reference sequence. 
The four lanes of sequence immediately below the reference 

35 sequence are the respective sequences read from the four-sized 
groups of probes for the pPoll9 target (from top-to-bottom, 13 
mers, 15 mers, 17 mers and 19 mers) . The next four lanes of 
sequence are the sequences read from the four-sized groups of 
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probes for the 4MUT18 target (from top-to-bottom in the same 
order) . The regions of sequences shown in normal type are 
those that could be read unambiguously from the chip. Regions 
where sequence could not be accurately read are shown 
5 highlighted. Some regions of sequence that could not be read 
from one sized set of probes could be read from another. 

Taking the best result from the four sized groups of 
probes at each column position, about 97% of bases in the 
pPoll9 sequence and about 9 0% of bases in the 4MUT18 sequence 

10 were read accurately. Of the 31 nucleotide differences 

between 4MUT18 and the reference sequence, twenty-seven were 
read correctly including three of the nucleotide changes 
associated with acquisition of drug resistance. Of the 
ambiguous regions in the 4MUT18 sequence determination, most 

15 occurred in the 4MUT18 segments flanking points of divergence 
between the 4MUT18 and reference sequences. Notably, most of 
the common mutations in HIV reverse transcriptase associated 
with drug resistance (see Table 3) occur at sequence positions 
that can be read from the chip. Thus, most of the commonly 

20 occurring mutations can be detected by a chip containing an 
array of probes based on a single reference sequence. 

Comparison of the sequence read of the probes of 
different sizes is useful in determining the optimum size 
probe to use for different regions of the target. The 

25 strategy of customizing probe length within a single group of 
probe sets minimizes the total number of probes required to 
read a particular target sequence. This leaves ample capacity 
for the chip to include probes to other reference sequences 
(e.g., 16S RNA for pathogenic microorganisms) as discussed 

3 0 below. 

The HV 4 07 chip has also been tested for its capacity to 
detect mixtures of different HIV strains. The mixture 
comprises varying proportions of two target sequences; one a 
segment of a reverse transcriptase gene from a wildtype SF2 
35 strain, the other a corresponding segment from an SF2 strain 
bearing a codon 67 mutation. See Fig. 13. The Figure also 
represents the probes on the chip having an interrogation 
position for reading the nucleotide in which the mutation 
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occurs. A single probe in the Figure represents four probes 
on the chip with the symbol (o) indicating the interrogation 
position, which differs in each of the four probes. Figure 14 
shows the fluorescence intensity for the four 13 mers and the 
5 four 15 mers having an interrogation position for reading the 
nucleotide in the target sequence in which the mutation 
occurs. As the percentage of mutant target is increase, the 
fluorescence intensity of the probe exhibiting perfect 
complementarity to the wildtype target decreases, and the 

10 intensity of the probe exhibiting perfect complementarity to 
the mutant sequence increases. The intensities of the other 
two probes do not change appreciably. It is concluded that 
the chip can be used to analyze simultaneously a mixture of 
strains, and that a strain comprising as little as ten percent 

15 of a mixture can be easily detected. 

c. Protease Chip 

A protease chip was constructed using the basic tiling 
strategy. The chip comprises four probes tiling across a 382 

20 nucleotide span including 297 nucleotides from the protease 

coding sequence. The reference sequence was a consensus Clay- 
B HIV protease sequence. Different probes lengths were 
employed for tiling different regions of the reference 
sequence. Probe lengths were 11, 14, 17 and 20 nucleotides 

25 with interrogation positions at or adjacent to the center of 
each probe. Lengths were optimized from prior hybridization 
data employing a chip having multiple tilings, each with a 
different probe length. 

The chip was hybridized to four different single-stranded 

30 DNA protease target sequences (HXB2, SF2, NY5, pPol4mut!8) . 
Both sense and antisense strands were sequenced. Data from 
the chip was compared with that from an ABI sequencer. The 
overall accuracy from sequencing the four targets is 
illustrated in the Table 5 below. 
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Table 5 







ABI 


Protease 


Chip 




Sense 


Antisense 


Sense 


Antisense 


No call 


0 


4 


9 


4 


Ambiguous 


6 


14 


17 


8 


Wrong call 


2 


3 


3 


1 


TOTAL 


8 


21 


29 


13 



10 

ABI (sense) - 99.5% 
Chip (sense) - 98,1% 

15 

ABI (antisense) - 98.6% 
Chip (antisense) - 99.1% 

20 Combining the data from sense and antisense strands , both the 
chip and the ABI sequencer provided 100% accurate data for all 
of the sequence from all four clones. 

In a further test, the chip was hybridized to protease 
target sequences from viral isolates obtained from four 

25 patients before and after ddl treatment. The sequence read 
from the chip is shown in Fig. 15. Several mutations 
(indicated by arrows) have arisen in the samples obtained 
posttreatment . Particularly noteworthy was the chip's 
capacity to read a g/a mutation at nucleotide 207, 

30 notwithstanding the presence of two additional mutations (gt) 
at adjacent positions. 

B. Cystic Fibrosis Chips 

A number of years ago, cystic fibrosis, the most common 
35 severe autosomal recessive disorder in humans, was shown to be 
associated with mutations in a gene thereafter named the 
Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) 
gene. The CFTR gene is about 250 kb in size- and has 27 exons. 
Wildtype genomic sequence is available for all exonic regions 
40 and exons/intron boundaries (Zielenski et al., Genomics 10, 
214-228 (1991) . The full-length wildtype cDNA sequence has 
also been described (see Riordan et al., Science 245, 1059- 
1065 (1989). Over 400 mutations have been mapped (see Tsui et 
al, Hu. Mutat.' 1, 197-203 (1992). Many of the more common 
45 mutations are shown in Table 6. The most common cystic 



WO 95/11995 PCT/US94/12305 

75 

fibrosis mutation is a three-base deletion resulting in the 
omission of amino acid #508 from the CFTR protein. The 
frequency of mutations varies widely in populations of 
different geographic or ethnic origin (see column 4 of 
5 Table 6) . About 9 0% of all mutations having phenotypic 
effects occur in coding regions. 

Detection of CFTR mutations is useful in a number of 
respects. For example, screening of populations can identify 
asymptomatic heterozygous individuals. Such individuals are 

10 at risk of giving rise to affected offspring suffering from CF 
if they reproduce with other such individuals. In utero 
screening of fetuses is also useful in identifying fetuses 
bearing 2 CFTR mutations. Identification of such mutations 
offers the possibility of abortion, or gene therapy. For 

15 couples known to be at risk of giving rise to affected 

progeny, diagnosis can be combined with In vitro reproduction 
procedures to identify an embryo having at least one wildtype 
CF allele before implantation. Screening children shortly 
after birth is also of value in identifying those having 

20 2 copies of the defective gene. Early detection allows 
administration of appropriate treatment (e.g., Pulmozyme 
Antibiotics, Pertussive Therapy) thereby improving the quality 
of life and perhaps prolonging the life expectancy of an 
individual. 

25 The source of target DNA for detecting of CFTR mutations 

is usually genomic. In adults, samples can conveniently be 
obtained from blood or mouthwash epithelial cells. In 
fetuses, samples can be obtained by several conventional 
techniques such as amniocentesis, chorionic villus sampling or 

3 0 fetal blood sampling. At birth, blood from the amniotic chord 
is a useful tissue source. 

The target DNA is usually amplified by PCR. Some 
appropriate pairs of primers for amplifying segments of DNA 
including the sites of known mutations are listed in Tables 5 

3 5 and 6. 
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Table 7 



OLIGO NUMBER 


SEQUENCE 


787 


TCTCCTTGGATATACTTGTGTGAATCAA 


788 


TCACCAGATTTCGTAGTCTTTTCATA 


851 


GTCTTGTGTTGAAATTCTCAGGGTAT 


769 


CTTGTACCAGCTCACTACCTAAT 


887 


ACCTGAGAAGATAGTAAGCTAGATGAA 


888 


AACTCCGCCTTTCCAGTTGTAT 


934 


TTAGTTTCTAGGGGTGGAAGATACA 


935 


TTAATGACACTGAAGATCACTGTTCTAT 


789 


CCATTCCAAGATCCCTGATATTTGAA 


790 


GCACATTTTTGCAAAGTTCATTAGA 


891 


TCATGGGCCATGTGCTTTTCAA 


892 


ACCTTCCAGCACTACAAACTAGAA 


760 


CAAGTGAATCCTGAGCGTGATTT 


850 


GGTAGTGTGAAGGGTTCATATGCATA " 


762 


GATTACATTAGAAGGAAGATGTGCCTTT 


763 


ACATGAATGACATTTACAGCAAATGCTT 


931 


GTGACCATATTGTAATGCATGTAGTGA 


932 


ATGGTGAACATATTTCTCAAGAGGTAA 


955 


TGT CTC TGT AAA CTG ATG GCT AAC A 


884 


TCGTATAGAGTTGATTGGATTGAGAA 


885 


CCATTAACTTAATGTGGTCTCATCACAA 


.886 


CTACCATAATGCTTGGGAGAAATGAA 


782 


TCAAAGAATGGCACCAGTGTGAAA 


901 


TGCTTAGCTAAAGTTAATGAGTTCAT 



10 



15 



20 



25 
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OLIGO NUMBER 


SEQUENCE 


784 


AATTGTGAAATTGTCTGCCATTCTTAA 


785 


GATTCACTTACTGAACACAGTCTAACAA 


791 


AGGCTTCTCAGTGATCTGTTG 


792 


GAATCATTCAGTGGGTATAAGCA 


1013 


GCCATGGTACCTATATGTCACAGAA 


1012 


TGCAGAGTAATATGAATTTCTTGAGTACA 


766 


GGGACTCCAAATATTGCTGTAGTAT 


1065 


GTACCTGTTGCTCCAGGTATGTT 



Other primers can be readily devised from the known 
genomic and cDNA sequences of CFTR. The selection of 
primers, of course, depends on the areas of the target 
sequence that are to be screened. The choice of primers also 
depends on the strand to be amplified. For some regions of 
the CFTR gene, it makes little difference to the hybridization 
signal whether the coding or noncoding strand is used. In 
other regions, one strand may give better discrimination in 
hybridization signals between matched and mismatched probes 
than the other. The upper limit in the length of a segment 
that can be amplified from one pair of PCR primers is about 50 
kb. Thus, for analysis of mutants through all or much of the 
CFTR gene, it is often desirable to amplify several segments 
from several paired primers. The different segments may be 
amplified sequentially or simultaneously by multiplex PCR. 
Frequently, fifteen or more segments of the CFTR gene are 
simultaneously amplified by PCR. The primers. and 
amplifications conditions are preferably selected to generate 
DNA targets. An asymmetric labelling strategy incorporating 
f luorescently labelled dNTPs for random labelling and dUTP for 
target fragmentation to an average length of less than 60 
bases is preferred. The use of dUTP and fragmentation with 
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uracil N-glycosylase has the added advantage of eliminating 
carry over between samples. 

Mutations in the CFTR gene can be detected by any of the 
tiling strategies noted above. The block tiling strategy is 
5 one particularly useful approach. In this strategy, a group 
(or block) of probes is used to analyze a short segment of 
contiguous nucleotides (e.g., 3, 5, 7 or 9) from a CFTR gene 
centered around the site of a mutation. The probes in a group 
are sometimes referred to as constituting a block because all 

10 probes in the group are usually identical except at their 

interrogation positions. As noted above, the probes may also 
differ in the presence of leading or trailing sequences 
flanking regions of complementary. However, for ease of 
illustration, it will be assumed that such sequences are not 

15 present. As an example, to analyze a segment of five 

contiguous nucleotides from the CFTR gene, including the site 
of a mutation (such as one of the mutations in Table 6) , a 
block of probes usually contains at least one wildtype probe 
and five sets of mutant probes, each having three probes. The 

20 wildtype probe has five interrogation positions corresponding 
to the five nucleotides being analyzed from the reference 
sequence. However, the identity of the interrogation 
positions is only apparent when the structure of the wildtype 
probe is compared with that of the probes in the five mutant 

25 probe sets. The first mutant probe set comprises three 

probes, each being identical to the wildtype probe, except in 
the first interrogation position, which differs in each of the 
three mutant probes and the wildtype probe. The second 
through fifth mutant probe sets are similarly composed except 

3 0 that the differences from the wildtype probe occur in the 
second through fifth interrogation position respectively. 
Note that in practice, each set of mutant probes is sometimes 
laid down on the chip juxtaposed with an associated wildtype 
probe. In this situation, a block would comprise five 

35 wildtype probes, each effectively providing the same 

information. However, visual inspection and confidence 
analysis of the chip is facilitated by the largely redundant, 
information provided by five wildtype probes. 
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After hybridization to labelled target, the relative 
hybridization signals are read from the probes. Comparison of 
the intensities of the three probes in the first mutant probe 
set with that of the wildtype probe indicates the identity of 
the nucleotide in the target sequence corresponding to the 
first interrogation position. Comparison of the intensities 
of the three probes in the second mutant probe set with that 
of the wildtype probe indicates the identity of the nucleotide 
in the target sequence corresponding to the second 
interrogation position, and so forth. Collectively, the 
relative hybridization intensities indicate the identity of 
each of the five contiguous nucleotides in the reference 
sequence. 

In a preferred embodiment, a first group (or block) of 
probes is tiled based on a wildtype reference sequence and a 
second group is tiled based a mutant version of the wildtype 
reference sequence. The mutation can be a point mutation, 
insertion or deletion or any combination of these. The 
combination of first and second groups of probes facilitates 
analysis when multiple target sequences are simultaneously 
applied to the chip, as is the case when a patient being 
diagnosed is heterozygous for the CFTR allele. 

The above strategy is illustrated in Fig. 16, which shows 
two groups of probes tiled for a wildtype reference sequence 
and a point mutation thereof. The five mutant probe sets for 
the wildtype reference sequence are designated wtl-5, and the 
five mutant probe sets for the mutant reference sequence are 
designated ml-5. The letter N indicates the interrogation 
position, which shifts by one position in successive probe 
sets from the same group. The figure illustrates the 
hybridization pattern obtained when the chip is hybridized 
with a homozygous wildtype target sequence comprising 
nucleotides n-2 to n+2, where n is the site of a mutation. 
For the group of probes tiled based on the reference sequence, 
four probes are compared at each interrogation position. At 
each position, one of the four probes exhibits a perfect match 
with the target, and the other three exhibit a single-base 
mismatch. For the group of probes tiled based on the mutant 
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reference sequence, again four probes are compared at each 
interrogation position. At position, n, one probe exhibits a 
perfect match, and three probes exhibit a single base 
mismatch- Hybridization to a homozygous mutant yields an 
5 analogous pattern, except that the respective hybridization 

patterns of probes tiled on the wildtype and mutant reference 
sequences are reversed. 

The hybridization pattern is very different when the chip 
is hybridized with a sample from a patient who is heterozygous 

10 for the mutant allele (see Fig. 17) . For the group of probes 
tiled based on the wildtype sequence, at all positions but n, 
one probe exhibits a perfect match at each interrogation 
position, and the other three probes exhibit a one base 
mismatch. At position n, two probes exhibit a perfect match 

15 (one for each allele) , and the other probes exhibit single- 
base mismatches. For the group of probes tiled on the mutant 
sequence, the same result is obtained. Thus, the heterozygote 
point mutant is easily distinguished from both the homozygous 
wildtype and mutant forms by the identity of hybridization 

20 patterns from the two groups of probes. 

Typically, a chip comprises several paired groups of 
probes, each pair for detecting a particular mutation. For 
example, some chips contain 5, 10, 20, 40 or 100 paired groups 
of probes for detecting the corresponding numbers of 

25 mutations. Some chips are customized to include paired groups 
of probes for detecting all mutations common in particular 
populations (see Table 6) . Chips usually also contain control 
probes for verifying that correct amplification has occurred 
and that the target is properly labelled. 

3 0 The goal of the tiling strategy described above is to 

focus on short regions of the CTFR region flanking the sites 
of known mutation. Other tiling strategies analyze much 
larger regions of the CFTR gene, and are appropriate for 
locating and identifying hitherto uncharacterized mutations. 

35 For example, the entire genomic CFTR gene (250 kb) can be 

tiled by the basic tiling strategy from an array of about one 
million probes. Synthesis and scanning of such an array of 
probes is entirely feasible. Other tiling strategies, such as 
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the block tiling, multiplex tiling or pooling can cover the 
entire gene with fewer probes. Some tiling strategies analyze 
some or all of components of the CFTR gene, such as the cDNA 
coding sequence or individual exons. Analysis of exons 10 and 
5 11 is particularly informative because these are location of 
many common mutations including the AF508 mutation. 
Exemplary CFTR chips 

One illustrative chip bears an array of 1296 probes 
covering the full length of exon 10 of the CFTR gene arranged 

10 in a 36 x 36 array of 356 fim elements. The probes in the 

array can have any length, preferably in the range of from 10 
to 18 residues and can be used to detect and sequence any 
single-base substitution and any deletion within the 192-base 
exon, including the three-base deletion known as AF508. As 

15 described in detail below, hybridization of nanomolar 

concentrations of wild-type and AF508 oligonucleotide target 
♦ nucleic acids labeled with fluorescein to these arrays 

produces highly specific signals {detected with confocal 
scanning fluorescence microscopy) that permit discrimination 

20 between mutant and wild-type target sequences in both 
homozygous and heterozygous cases. 

Sets of probes of a selected length in the range of from 
10 to 18 bases and complementary to subsequences of the known 
wild-type CFTR sequence are synthesized starting at a position 

25 a few bases into the intron on the 5'-side of exon 10 and 

ending a few bases into the intron on the 3' -side. There is a 
probe for each possible subsequence of the given segment of 
the gene, and the probes are organized into a "lane" in such a 
way that traversing the lane from the upper left-hand corner 

3 0 of the chip to the lower righthand corner corresponded to 

traversing the gene segment base-by-base from the 5'-end. The 
lane containing that set of probes is, as noted above, called 
the "wild-type lane." 

Relative to the wild-type lane, a "substitution" lane, 

35 called the "A-lane", was synthesized on the chip. The A-lane 
probes were identical in sequence to an adjacent (immediately 
below the corresponding) wild-type probe but contained, 
regardless of the sequence of the wild-type probe, a dA 
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residue at position 7 (counting from the 3 '-end). In similar 
fashion, substitution lanes with replacement bases dC, dG, and 
dT were placed onto the chip in a "C-lane," a "G-lane," and a 
"T-lane," respectively. A sixth lane on the chip consisted of 
5 probes identical to those in the wild-type lane but for the 
deletion of the base in position 7 and restoration of the 
original probe length by addition to the 5 1 -end the base 
complementary to the gene at that position. 

The four substitution lanes enable one to deduce the 
10 sequence of a target exon 10 nucleic acid from the relative 
intensities with which the target hybridizes to the probes in 
the various lanes. Various versions of such exon 10 DNA chips 
were made as described above with probes 15 bases long, as 
well as chips with probes 10, 14, and 18 bases long. For the 
15 results described below, the probes were 15 bases long, and 
the position of substitution was 7 from the 3 f -end. 

The sequences of several important probes are shown 
below. In each case, the letter "X" stands for the 
interrogation position in a given column set, so each of the 
20 sequences actually represents four probes, with A, C, G, and 
T, respectively, taking the place of the "X." Sets of shorter 
probes derived from the sets shown below by removing up to 
five bases from the 5 1 -end of each probe and sets of longer 
probes made from this set by adding up to three bases from the 
25 exon 10 sequence to the 5'-end of each probe, are also useful 
and provided by the invention. 
3 1 -TTTATAXTAGAAACC 
3 1 - TTATAGXAGAAACCA 
3 1 - TATAGTXGAAACCAC 
3 0 3»- ATAGTAXAAACCACA 
3 1 - TAGTAGXAACCACAA 
3 f - AGTAGAXACCACAAA 
3 ' - GTAGAAX CC AC AAA G 

3 1 - TAGAAAXCACAAAGG 
35 3 1 - AGAAACXACAAAGGA 



To demonstrate the ability of the chip to distinguish the 
AF508 mutation from the wild-type, two synthetic target 
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nucleic acids were made. The first, a 3 9-mer complementary to 
a subsequence of exon 10 of the CFTR gene having the three 
bases involved in the AF5 08 mutation near its center, is 
called the "wild-type" or wt508 target, corresponds to 
5 positions 111-14 9 of the exon, and has the sequence shown 
below: 

5 1 -CATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGA. 

The' second, a 3 6-mer probe derived from the wild-type target 
by removing those same three bases, is called the "mutant" 

10 target or mu508 target and has the sequence shown below, first 
with dashes to indicate the deleted bases, and then without 
dashes but with one base underlined (to indicate the base 
detected by the T-lane probe, as discussed below) : 
5 1 -CATTAAAGAAAATATCAT TGGTGTTTCCTATGATGA; 

15 5 1 -CATTAAAGAAAATATCATTGGTGTTTCCTATGATGA . 

Both targets were labeled with fluorescein at the 5* -end. 

In three separate experiments, the wild-type target, the 
mutant target, and an equimolar mixture of both targets was 
exposed (0.1 nM wt508, 0.1 nM mu508, and 0.1 nM wt508 plus 0.1 

20 nM mu508, respectively, in a solution compatible with nucleic 
acid hybridization) to a CF chip. The hybridization mixture 
was incubated overnight at room temperature, and then the chip 
was scanned on a reader (a confocal fluorescence microscope in 
photon-counting mode) ; images of the chip were constructed 

25 from the photon counts) at several successively higher 

temperatures while still in contact with the target solution. 
After each temperature change, the chip was allowed to 
equilibrate for approximately one-half hour before being 
scanned. After each set of scans, the chip was exposed to 

30. denaturing solvent and conditions to wash, i.e., remove target 
that had bound, the chip so that the next experiment could be 
done with a clean chip. 

The results of the experiments are shown in Figures 18, 
19, 20, and 21. Figure 18, in panels A, B, and C, shows an 

3 5 image made from the region of a DNA chip containing CFTR exon 
10 probes; in panel A, the chip was hybridized to a wild-type 
target; in panel C, the chip was hybridized to a mutant AF508 
target; and in panel B, the chip was hybridized to a mixture 
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of the wild-type and mutant targets. Figure 19, in sheets 1 - 
3, corresponding to panels A, B, and C of Figure 3, shows 
graphs of fluorescence intensity versus tiling position. The 
labels on the horizontal axis show the bases in the wild-type 
5 sequence corresponding to the position of substitution in the 
respective probes. Plotted are the intensities observed from 
the features (or synthesis sites) containing wild-type probes, 
the features containing the substitution probes that bound the 
most target ("called") , and the feature containing the 
10 substitution probes that bound the target with the second 
highest intensity of all the substitution probes ("2nd 
Highest") . 

These figures show that, for the wild-type target and the 
equimolar mixture of targets, the substitution probe with a 

15 nucleotide sequence identical to the corresponding wild-type 
probe bound the most target, allowing for an unambiguous 
assignment of target sequence as shown by letters near the 
points on the curve. The target wt508 thus hybridized to the 
probes in the wild-type lane of the chip, although the 

20 strength of the hybridization varied from probe-to-probe, 
probably due to differences in melting temperature. The 
sequence of most of the target can thus be read directly from 
the chip, by inference from the pattern of hybridization in 
the lanes of substitution probes (if the target hybridizes 

25 most intensely to the probe in the A-lane, then one infers 
that the target has a T in the position of substitution, and 
so on) . 

For the mutant target, the sequence could similarly be 
called on the 3 f -side of the deletion. However, the intensity 

30 of binding declined precipitously as the point of substitution 
approached the site of the deletion from the 3 1 -end of the 
target, so that the binding intensity on the wild-type probe 
whose point of substitution corresponds to the T at the 3' -end 
of the deletion was very close to background. Following that 

3 5 pattern, the wild-type probe whose point of substitution 
corresponds to the middle base (also a T) of the deletion 
bound still less target. However, the probe in the T-lane of 
that column set bound the target very well. Examination of 
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the sequences of the two targets reveals that the deletion 
places an A at that position when the sequences are aligned at 
their 3 '-ends and that the T-lane probe is complementary to 
the mutant target with but two mismatches near an end (shown 
5 below in lower-case letters, with the position of substitution 
underlined) : 

Target : 5 1 -CATTAAAGAAAATATCATTGGTGTTTCCTATGATGA 

Probe : 3 1 -TagTAGTAACCACAA 

Thus the T-lane probe in that column set calls the correct 

10 base from the mutant sequence. Note that, in the graph for 
the equimolar mixture of the two targets, that T-lane probe 
binds almost as much target as does the A-lane probe in the 
same column set, whereas in the other column sets, the probes 
that do not have wild-type sequence do not bind target at all 

15 as well. Thus, that one column set, and in particular the 

T-lane probe within that set, detects the AF508 mutation under 
conditions that simulate the homozygous case and also 
conditions that simulate the heterozygous case. 

Although in this example the sequence could not be 

20 reliably deduced near the ends of the target, where there is 
not enough overlap between target and probe to allow effective 
hybridization, and around the center of the target, where 
hybridization was weak for some other reason, perhaps high 
AT-content, the results show the method and the probes of the 

25 invention can be used to detect the mutation of interest. The 
mutant target gave a pattern of hybridization that was very 
similar to that of the wt508 target at the ends, where the two 
share a common sequence, and very different in the middle, 
where the deletion is located. As one scans the image from 

30 right to left, the intensity of hybridization of the target to 
the probes in the wild-type lane drops off much more rapidly 
near the center of the image for mu508 than for wt508; in 
addition, there is one probe in the T-lane that hybridizes 
intensely with mu508 and hardly at all with wt508. The 

35 results from the equimolar mixture of the two targets, which 
represents the case one would encounter in testing a 
heterozygous individual for the mutation, are a blend of the 
results for the separate targets, showing the power of the 
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invention to distinguish a wild-type target sequence from one 
containing the AF508 mutation and to detect a mixture of the 
two sequences. 

The results above clearly demonstrate how the DNA chips 
5 of the invention can be used to detect a deletion mutation, 
AF508; another model system was used to show that the chips 
can also be used to detect a point mutation as well. One 
mutation in the CFTR gene is G4 80C, which involves the 
replacement of the G in position 4 6 of exon 10 by a T, 

10 resulting in the substitution of a cysteine for the glycine 
normally in position #480 of the CFTR protein. The model 
target sequences included the 21-mer probe wt4 8 0 to represent 
the wild-type sequence at positions 37-55 of exon 10: 
5 1 -CCTTCAGAGGGTAAAATTAAG and the 21-mer probe mu480 to 

15 represent the mutant sequence: 
5 1 -CCTTCAGAGTGTAAAATTAAG . 

In separate experiments, a DNA chip was hybridized to 
each of the targets wt4 8 0 and mu48 0, respectively, and then 
scanned with a confocal microscope. Figure 20, in panels A, 

2 0 B, and C, shows an image made from the region of a DNA chip 
containing CFTR exon 10 probes; in panel A, the chip was 
hybridized to the wt480 target; in panel C, the chip was 
hybridized to the mu480 target; and in panel B, the chip was 
hybridized to a mixture of the wild-type and mutant targets. 

25 Figure 21, in sheets 1-3, corresponding to panels A, B, and 
C of Figure 20, shows graphs of fluorescence intensity versus 
tiling position. The labels on the horizontal axis show the 
bases in the wild-type sequence corresponding to the position 
of substitution in the respective probes. Plotted are the 

30 intensities observed from the features (or synthesis sites) 
containing wild-type probes, the features containing the 
substitution probes that bound the most target ("called"), and 
the feature containing the substitution probes that bound the 
target with the second highest intensity of all the 

35 substitution probes ("2nd Highest") . 

These figures show that the chip could be used to 
sequence a 16-base stretch from the center of the target wt48 0 
and. that discrimination against mismatches is quite good 



WO 95/11995 



87 



PCT/US94/12305 



throughout the sequenced region. When the DNA chip was 
exposed to the target mu4 80, only one probe in the portion of 
the chip shown bound the target well: the probe in the set of 
probes devoted to identifying the base at position 46 in exon 
5 10 and that has an A in the position of substitution and so is 
fully complementary to the central portion of the mutant 
target. All other probes in that region of the chip have at 
least one mismatch with the mutant target and therefore bind 
much less of it. In spite of that fact, the sequence of mu480 

10 for several positions to both sides of the mutation can be 

read from the chip, albeit with much-reduced intensities from 
those observed with the wild-type target. 

The results also show that, when the two targets were, 
mixed together and exposed to the chip, the hybridization 

15 pattern observed was a combination of the other two patterns. 
The wild-type sequence could easily be read from the chip, but 
the probe that bound the mu480 target so well when only the 
mu480 target was present also bound it well when both the 
mutant and wild-type targets were present in a mixture, making 

20 the hybridization pattern easily distinguishable from that of 
the wild-type target alone. These results again show the 
power of the DNA chips of the invention to detect point 
mutations in both homo- and heterozygous individuals. 

To demonstrate clinical application of the DNA chips of 

25 the invention, the chips were used to study and detect 

mutations in nucleic acids from genomic samples. Genomic 
samples from a individual carrying only the wild-type gene and 
an individual heterozygous for AF508 were amplified by PCR 
using exon 10 primers containing the promoter for T7 RNA 

3 0 polymerase. Illustrative primers of the invention are shown 
below. 

Exon Name Sequence 

10 CFi9-T7 TAATACGACTCACTATAGGGAGatgacctaataatgatgggttt 

10 CFil0c-T7 TAATACGACTCACTATAGGGAGtagtgtgaagggttcatatgc 

3 5 10 CFil0c-T3 CTCGGAATTAACCCTCACTAAAGGtagtgtgaagggttcatatgc 
11 CFilO-T7 TAATACGACTCACTATAGGGAGagcatactaaaagtgactctc 

11 CFillc-T7 TAATACGACTCACTATAGGGAGacatgaatgacatttacagcaa 
11 CFillc-T3 CGGAATTAACCCTCACTAAAGGacatgaatgacatttacagcaa 
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These primers can be used to amplify exon 10 or exon 11 
sequences; in another embodiment, multiplex PCR is employed, 
using two or more pairs of primers to amplify more than one 
exon at a time. 

5 The product of amplification was then used as a template 

for the RNA polymerase, with f luoresceinated UTP present to 
label the RNA product. After sufficient RNA was made, it was 
fragmented and applied to an exon 10 DNA chip for 15 minutes, 
after which the chip was washed with hybridization buffer and 

10 scanned with the fluorescence microscope. A useful positive 
control included on many CF exon 1-0 chips is the 8-mer 
3 1 -CGCCGCCG-5 1 . Figure 22, in panels A and B, shows an image 
made from a region of a DNA chip containing CFTR exon 10 
probes; in panel A, the chip was hybridized to nucleic acid 

15 derived from the genomic DNA of an individual with wild-type 
AF508 sequences; in panel B, the target nucleic acid 
originated from a heterozygous (with respect to the AF508 
mutation) individual. Figure 23, in sheets 1 and 2, 
corresponding to panels A and B of Figure 22, shows graphs of 

20 fluorescence intensity versus tiling position. 

These figures show that the sequence of the wild-type RNA 
can be called for most of the bases near the mutation. In the 
case of the AF508 heterozygous carrier, one particular probe, 
the same one that distinguished so clearly between the 

25 wild-type and mutant oligonucleotide targets in the model 

system described above, in the T-lane binds a large amount of 
RNA, while the same probe binds little RNA from the wild-type 
individual. These results show that the DNA chips of the 
invention are capable of detecting the AF508 mutation in a 

3 0 heterozygous carrier. 

Further chips were constructed using the block tiling 
strategy to provide an array of probes for analyzing a CFTR 
mutation. The array comprised 93 mm x 96 ^m features arranged 
into eleven columns and four rows (44 total probes) . Probes 

35 in five of these columns were from four probe sets tiled based 
on the wildtype CFTR sequence and having interrogation 
positions corresponding to the site of a mutation and two 
bases on either side. Five of the remaining columns contained 
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four sets of probes tiled based on the mutant version of the 
CFTR sequence. These probe sets also had interrogation 
positions corresponding to the site of mutation and two 
nucleotides on either side. The eleventh column contained 
5 four cells for control probes. 

Fluorescently labeled hybridization targets were prepared 
by PCR amplification. 100 fig of genomic DNA, 0.4 jxM of each 
primer, 50 each dATP, dCTP, dCTP and dUTP (Pharmacia) n 
lOmM Tris-Cl, pH 8.3, 50 mM KC1, 2.5 mM MgCl 2 and 2 U Taq 

10 polymerase (Perkin-Elmer) were cycled 3 6 times using a Perkin- 
Elmer 9600 thermocycler and the following times and 
temperatures: 95°C, 10 sec, 55°C / 10 sec, 72°C, 30 sec. 10 
/xl of this reaction product was used as a template in a 
second, asymmetric PCR reaction. Conditions included 1/xM 

15 asymmetric PCR primer, 50 fiK each dATP, dCTP, TTP, 25 /xM 

fluorescein-dGTP (DuPont) , 10 mM Tris-Cl, pH 9.1, 75 mM KCl, 
3.5 mM MgCl 2 . The reaction was cycled 5X with the following 
conditions: 95°C, 10 sec, 60°C, 10 sec, 55°C, 1 min. and 72°C, 
1.5 min. This was immediately followed with another 20 cycles 

20 using the following conditions: 95°C, 10 sec, 60°C, 10 sec, 
72°C, 1.5 min. 

Amplification products were fragmented by treating 
with 2 U of Uracil-N-glycosylase (Gibco) at 30°C for 30 min. 
followed by heat denaturation at 95°C for 5 min. Finally, the 

25 labeled, fragmented PCR product was diluted into hybridization 
buffer made up of 5 X SSPE and 1 mM Cetyltrimethylammonium 
Bromide (CTAB) . The dilution factor ranged from lOx to 25x 
with 4 0 jxl of sample being diluted into 0.4 ml to 1 ml of 
hybridization solution. 

30 Target hybridization was generally carried out with 

the chip shaking in a small dish containing 500 /xl to 1 ml 
total volume of hybridization solution. All hybridizations 
were done at 30°C constant temperature. Alternatively, some 
hybridizations were carried out' with chips enclosed in a 

35 plastic package with the 1 cm x 1 cm chip glued facing a 250 
Ml ■fluid chamber. 250-350 /xl of hybridization solution was 
introduced and mixed using a syringe pump. Temperature was 
controlled by interfacing the back surface of the package with 
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a Peltier heating/cooling device. Following hybridization 
chips were washed with 5X SSPE, 0.1% Triton X-100 at 25°C-30°C 
prior to fluorescent image generation. 

Hybridized, washed DNA chips were scanned for 
5 fluorescence using a stage-scanning confocal epif luorescent 
microscope and 488nm argon ion laser excitation. Emitted 
light was collected through a band pass filter centered at 
530nM. The resulting fluorescence image was spatially 
reconstructed and intensity data were then analyzed. Features 

10 with the peak fluorescence intensity in each column were 
identified and compared with any signal intensity at the 
remaining single base mismatch probe sites in the same column. 
The sequences of the highest intensity features were then 
compared across all ten columns of each sub-array to determine 

15 whether peak intensity scores for the wild type sequence and 
the mutant sequence were similar or significantly different. 
These results were used to generate the genotype call of wild 
type (high intensity signals only in wild type probe columns) , 
mutant (high intensity signals only in the mutant probe 

20 columns) or heterozygous (high intensity signals in both the 
wild type and mutant probe columns) . 

Figure 24 (panel A) shows an image of the fluorescence 
signals in arrays designed to detect the G551D(G>A) and 
Q552X(C>T) CFTR mutations. The hybridization target is an 

25 exon 11 amplicon generated from wild type genomic DNA. Wild 
type hybridization patterns are evident at both locations. No 
significant fluorescence signal resulted at any of the 
features with probes complementary to mutant or mismatched 
sequences. Relative fluorescence intensities were six fold 

30 brighter for the perfect matched wildtype features compared 
with the background signal intensity at mutant and mismatch 
features. In addition, the sequence at these loci can be 
confirmed as AGGTC and GTCAA, respectively, where the bold 
type face indicates the mutation sites. Figure 24 (panel B) 

35 shows the same probe array features after hybridization with a 
fluorescent target generated from DNA heterozygous for the 
G551D mutation. Both the wild type and mutant probe columns 
have features with significant fluorescence intensity, 
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indicating the hybridization of both wild type and mutant CFTR 
alleles at this site. Only wildtype probes hybridized with 
any significant fluorescence signal in the Q552X subarray 
indicating a wild type target sequence. However, an 
5 additional feature that did not hybridize in the first 

experiment shows significant fluorescence intensity in this 
experiment. Because the G551D and Q552X mutations are only 
two bases apart, the a probe sequence in the additional 
feature has a perfectly matched 12-mer overlap with the mutant 

10 G551D target. 

Figure 25 (panels A and B) illustrates mutation 
analysis for AF508, a three base pair deletion in Exon 10 of 
the CFTR gene. In contrast to the hybridization pattern seen 
in base change mutations, in mutations where bases are 

15 inserted or deleted, probe arrays show a different 

hybridization pattern. Identical probes are synthesized in 
the two central columns of , base substitution arrays. As a 
result, either mutant or wild type target hybridizations 
always result in two side-by-side features (a doublet) with 

20 high fluorescence intensity at the center of the array. In a 
heterozygote hybridization, two sets of doublets, one matched 
to the wild type sequence and one to the mutant sequence occur 
(Figure 24, panel B) . In contrast, wild type and mutant probe 
column sequences are offset from each other for deletion or 

25 insertion mutations and hybridization doublets are not seen. 
Instead of the six high intensity signals with one doublet, 
five independent features in alternating columns characterize 
a homozygote and ten features, one in each column will be 
positive with heterozygote targets. This is evident from the 

30 AF508 hybridization pattern in Figure 25, panel A. Although a 
wildtype target has been hybridized and the highest intensity 
features confirm the wild type sequence (ATCTT) , there is an 
additional hybridization in the first mutant column. Analysis 
of that probe sequence shows a 10 base perfect match with the 

3 5 mutant sequence. 

The image in Figure 25, panel B resulted from 
hybridizing a DNA chip with a target homozygous for AF508. In 
this image five features, all with probe sequences 
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complementary to the mutant show significant signal. The 
mutation sequence bridging the deletion site, ATTGG, is 
confirmed. Similar to what was seen in the example of the 
G551D mutation, there is added information in neighboring 
5 subarrays designed to detect the aI507 and F508C mutations. 
This is expected since they are in such close proximity to 
AF508 that their probe sets significantly overlap the AF508 
probes. The AF508 homozygous target has no perfect matches 
with wild type or mutant probes in the aI507 and F508C 

10 subarrays. However, there are some low intensity signals 
within these two blocks of probes. The F508C array has a 
doublet that matches 11 bases of the mutant AF508 target. 
Similarly, the hybridization in the eighth column of the aI507 
array has a probe that matches 13/14 bases with the target. 

15 Figure 26 shows hybridization of a heterozygous, double 

mutant AF508/F508C to the same array as described above. 
Conventional reverse dot blot would score this sample as a 
homozygous AF508 mutant. In the present assays, the AF508 and 
F508C alleles are separately detected by the respective 

20 subarrays designed to detect these mutations. 

C. Chips for Cancer Diagnosis 

There are at least two types of genes which are often 
altered in cancerous cells. The first type of gene is an 

25 oncogene such as a mismatch-repair gene, and the second type 
of gene is a tumor suppressor gene such as a transcription 
factor. Examples of mismatch repair oncogenes genes include 
hMSH2 (Fishel et al., Cell 75, 1027-1038 (1993)) and hMLHl 
(Papadopoulos et al., Science 263, 1625-1628 (1994)). The 

30 most well-known example of a tumor suppressor gene is the p53 
protein gene (Buchman et al., Gene 70, 245-252 (1988). By 
monitoring the state of both oncogenes and tumor suppressor 
genes (individually and in combination) in a patient, it is 
possible to determine individual susceptibility to a cancer, a 

35 patients prognosis upon cancer diagnosis, and to target 
therapy more efficiently. 

The p53 gene spans 20 kbp in humans and has 11 exons, 10 
of which are protein coding (see Tominaga et al., 1992, 
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Critical Reviews in Oncogenesis 3:257-282, incorporated herein 
by reference) . The gene produces a 53 kilodalton 
phosphoprotein that regulates DNA replication. The protein 
acts to halt replication at the Gl/S boundary in the cell 
5 cycle and is believed to act as a "molecular policeman," 

shutting down replication when the DNA is damaged or blocking 
the reproduction of DNA viruses (see Lane, 1992, Nature 
358:15-16, incorporated herein by reference). The p53 
transcription factor is part of a fundamental pathway which 

10 controls cell growth- Wild-type p53 can halt cell growth^, or 
in some cases bring about programmed cell death (apoptosis) . 
Such tumor-suppressive effects are absent in a variety of 
known p53 gene mutations. Moreover, p53 mutants not only 
deprive a cell of wild-type p53 tumor suppression, they also 

15 may spur abnormal cell growth. 

In tumor cells, p53 is the most commonly mutated gene 
discovered to date (see Levine et al., 1991, Nature 
351:453-456, and Hollstein et al,, 1991, Science 253:49-53, 
each of which is incorporated herein by reference) Over half 

20 of the 6.5 million patients diagnosed with cancer annually 
possess p53 mutations in their tumor cells. Among common 
tumors, about 70% of colorectal cancers, 50% of lung cancers 
and 40% of breast cancers contain p53 mutations. In all, over 
51 types of human tumors have been documented to possess p53 

25 mutations, including bladder, brain, breast, cervix, colon, 
esophagus, larynx, liver, lung, ovary, pancreas, prostate, 
skin,, stomach, and thyroid tumors (Culotta & Koshland, Science 
262, 1958-1961 (1993); Rodrigues et al., 1990, PNAS 
87:7555-7559, incorporated herein by reference). According to 

30 data presented by David Sidransky (1992 San Diego Conference), 
over 400 mutations in p53 are known. The presence of a p53 
mutation in a tumor has also been correlated with a patient's 
prognosis. Patients who possess p53 mutations have a lower 5- 
year survival rate. 

35 Proper diagnosis of the form of p53 in tumor cells is 

critical to clinicians to prescribe appropriate therapeutic 
regimens. For instance, patients with breast cancer who show 
no invasion of nearby lymph nodes generally do not relapse 
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after standard surgical treatment and chemotherapy. Of the 
25% who do relapse after surgery and chemotherapy, additional 
chemotherapy is appropriate. At present, there is no clear 
way to determine which patients will benefit from such 
5 additional chemotherapy prior to relapse. However, 

correlating p53 mutations to tumorigenicity and metastasis 
provides clinicians with a means to determine whether such 
additional treatments are warranted. 

In addition to facilitating conventional chemotherapy, 

10 appropriate diagnosis of p53 mutations provides clinicians 

with the ability to identify individuals who will benefit the 
most from gene therapy techniques, in which appropriately 
operative p53 copies are restored to a tumor site. Clinical 
p53 gene therapy trials are presently underway (Culotta & 

15 Koshland, supra) . 

The analysis of p53 mutations can also be used to 
identify which carcinogens lead to particular tumors (Harris, 
Science 262, 1980-1981 (1993)). For instance, dietary 
aflatoxin B x exposure is associated with G:C to T:A 

20 transversions at residue 249 of p53 in hepatocellular 

carcinomas (Hsu et al., Nature 350, 427 (1991); Bressac et 
al., Nature 350, 429 (1991); Harris, supra). 

While most described p53 mutations are somatic in origin, 
some types of cancer are associated with germline p53 

25 mutation. For instance, Li-Fraumeni syndrome is a hereditary 
condition in which individuals receive mutant p53 alleles, 
resulting in the early onset of various cancers (Harris, 
supra); Frebourg et al., PNAS 89, 6413-6417 (1992); Malkin et 
al., Science 250, 1233 (1990)). These mutations are 

3 0 associated with instability in the rest of the genome, 

creating multiple genetic alterations, and eventually leading 
to cancer. 

hMLHl and hMSH2 are mismatch repair genes which are 
causal agents in hereditary nonpolyposis colorectal cancer in 
35 individuals with mutant hMLHl or hMSH2 alleles (Fishel et al., 
supra, and Papadopoulos et al., supra). Hereditary 
nonpolyposis colorectal cancer is a common genetic disorders, 
affecting about 1 in 200 individuals (Lynch et al., 
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Gastroenterology 104, 1535 (1993)). Detection of hMLHl and 
hMSH2 mutations in the population allows diagnosis of 
nonpolyposis colorectal cancer prone individuals prior to the 
manifestation of disease. This allows for the implementation 
5 of special screening programs for cancer-prone individuals to 
ensure early detection of cancer, thereby enhancing survival 
rates of afflicted individuals. In addition, genetic 
counselors may use the information derived from HMLHl and 
HMSH2 chips to improve family planning as described for cystic 

10 fibrosis chips* The detection of mutations in hMLHl and hMSH2 
individually or in combination with p53 can also be used by 
clinicians to assess cancer prognosis and treatment modality. 
Finally, the information can be used to target appropriate 
individuals for gene therapy. 

15 The entire hMLHl gene is less than 85 kbp in length, 

comprising 2268 coding nucleotides (Papadopoulos et al., 
supra) , Sequences from the gene have been deposited with 
GenBank (accession number U07418) . Mutations associated with 
hereditary nonpolyposis colorectal cancer include the deletion 

20 of exon 5 (codons 578-632), a 4 base pair deletion of codons 
727 and 728 resulting in a shift in the reading frame of the 
gene, a 4 base pair insertion at codons 755 and 756 resulting 
in an extension of the COOH terminus, a 371 base pair deletion 
and frameshift mutation at position 347, and a transversion 

25 causing an alteration of codon 252 resulting in the insertion 
of a stop codon (id.). 

hMSH2 is a human homologue of the bacterial MutS and S. 
cerevisiae MSH mismatch-repair genes. MSH2, like hMLHl is 
associated with hereditary nonpolyposis cancer. Although only 

30 a few MSH2 gene samples from tumor tissue have been 

characterized, at least some tumor samples show a T to C 
transition mutation at position 2020 of the cDNA sequence, 
resulting in the loss of an intron-exon splice acceptor site. 
In view of the role of mutations in p53, MSH2 and/or 

35 hMLHl in hereditary predisposition to cancer, to neoplastic 
transformation events leading to cancer and to cancer 
prognosis, it is important to screen individuals to determine 
whether they possess mutant alleles, and to identify precisely 
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mutations are point mutations, or extremely small insertions 
or deletions , which are generally undetectable by standard 
Southern analysis, accurate diagnosis requires a capacity to 
5 examine a gene nucleotide-by-nucleotide. 

Mutations in the hMSH2, hMLHl or p53 genes, irrespective 
of whether previously characterized, can be detected by any of 
the tiling strategies noted above. Reference sequences of 
interest include full-length genomic and cDNA sequences of 

10 each of these genes and subsequences thereof, such as exons 
and introns. For example, each nucleotide in the 20 kb p53 
genomic sequence can be tiled using the basic strategy with an 
array of about 80,000 probes. As in the CFTR chip., some 
reference sequences are comparatively short sequences 

15 including the site of a known mutation and a few flanking 
nucleotides. Some chips tile reference sequences that 
encompass mutational "hot spots." For instance, a variety of 
cellular and oncoviral proteins bind to specific regions of 
p53, including Mdm2, SV40 T antigen, Elb from adenovirus and 

20 E6 from human papilloma virus. These binding sites correlate 
to some extent with observed high frequency somatic mutation 
regions of p53 found in tumor cells from cancer patients (see 
Harris et al., supra). Hot spots include exons 2, 3, 5, 6, 7 
and 8 and the intronic regions between exons 2 and 3 , 3 and 4 

25 and 4 and 5. Fragments of the hMLHl gene of particular 

interest include those encoding codons 578-632, 727, 728, 347, 
252. Some chips are tiled to read mutations in each of the 
hMSH2 , hMLHl and p53 genes, both wildtype and mutant versions. 
Standard or asymmetric PGR can be used to generate the 

3 0 target DNA used in the tiling assays described above. In 

general, PCR is used to amplify hMSH2 , hMLHl or p53 sequences 
from a tissue of interest such as a tumor. Mixed PCR 
reactions can also be used to generate hMSH2, hMLHl or p53 
sequences simultaneously in a single reaction mixture. Any of 

35 the coding or noncoding sequences from the genes may be 

amplified for use in the block tiling assays described above. 

Table 8 below provides examples of primers which are 
useful in synthesizing specific regions of hMSH2, hMHLHl and 
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p53.' Other primers can readily be devised from the known 
genomic and cDNA sequences of the genes. The primers 
described in Table 8 specific for p53 amplification have ends 
tailored to facilitate cloning into standard restriction 
enzyme cloning sites. 

Table 8: Examples of PCR primers useful in amplifying regions of p53, hMHH1 and 
hMSH2. 



Region 
Amplified 

Exon 5 
(p53) 

Exon 5 
(p53) 

Exon 6 
(p53) 

Exon 6 
(p53) 

Exon 7 
(p53) 

Exon 7 
(p53) 

Exon 8 
(p53) 

Exon 8 
(p53) 

hMSH2 



hMSH2 
hMLHI 



Primer Sequence 



TAA TAC GAC TCA CTA TAG GGA GA CCC 
TGG GCA ACC AGC CCT GTC GT 

ATG CAA TTA ACC CTC ACT AAA GGG 
AGA CAC TTG TGC CCT GAC TTT CAA C 

TAA TAC GAC TCA CTA TAG GGA GCC 
TCC TCC CAG AGA CCC 

ATG CAA TTA ACC CTC ACT AA GGG AGA 
TCC CCA GGC CTC TGA TTC CTC ACT G 

TAA TAC GAC TCA CTA TAG GGA CTG 
GGG CAC AGC CAG GCC AGT GTG CA 

ATG CAA TTA ACC CTC ACT AAA GGG 
AGA GTC TCC CCA AGG CGC ACT GGC 
CTC A 

TAA TAC GAC TCA CTA TAG GGA GGG 
CAT AAC TGC ACC CTT GGT CTC CTC C 

ATG CAA TTA ACC CTC ACT AAA GGG 
AGA GGA CCT GAT TTC CTT ACT GCC TCT 
TGC 

GAC ATG GCG GTG CAG CCG AAG GAG A 



hMLHI 



CTA TGT CAA TTG CAA ACA GTG CTC AGT 
TAC AG 

CTT GGC TCT TCT GGC GCC AAA ATG TCG 
TTC 



TAT GTT AAG ACA CAT CTA TTT ATT TAT 
AATCAATCC 



Description 

Exon 5 T7 Primer {5' T7 
to p53 3'). 

Exon 5 T3 Primer (5' T3 
to p53 3'). 

Exon 6 T7 Primer (5T7 
to p53 3'). 

Exon 6 T3 Primer (5T3 
to p53 3'). 

Exon 7 T7 Primer (5' T7 
to p53 3'). 

Exon 7 T3 Primer (5' T3 
to p53 3'). 



Exon 8 T7 Primer (5' T7 
to p53 3'). 

Exon 8 T3 Primer (5' T3 
to p53 3'). 

Primer for MSH2, 5' to 
3'. If used with MSH2 
primer below, a 3033 
base pair ampiicon will 
result 

Primer for hMSH2 5'to 
3'. 

Primer for hMLH1, 5'to 
3'. If used with hMLHI 
primer below, a 2484 
base pair ampiicon will 
result. 

Primer for hMLHI 5' to 

3'. 
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After PCR amplification of the target amplicon one strand 
of the amplicon can be isolated, i.e., using a biotinylated 
primer that allows capture of the undesired strand on 
streptavidin beads. Alternatively, asymmetric PCR can be used 
5 to generate a single-stranded target. Another approach 

involves the generation of single stranded RNA from the PCR 
product by incorporating a T7 or other RNA polymerase promoter 
in one of the primers. The single-stranded material can 
optionally be fragmented to generate smaller nucleic acids 
10 with less significant secondary structure than longer nucleic 
acids. 

In one such method, fragmentation is combined with 
labeling. To illustrate, degenerate 8-mers or other 
degenerate short oligonucleotides are hybridized to the 
15 single-stranded target material. In the next step, a DNA 
polymerase is added with the four different 

dideoxynucleotides, each labeled with a different fluorophore. 
Fluorophore-labeled dideoxynucleotide are available from a 
variety of commercial suppliers. Hybridized 8-mers are 

20 extended by a labeled dideoxynucleotide. .After an optional 
purification step, i.e., with a size exclusion column, the 
labeled 9-mers are hybridized to the chip. Other methods of 
target fragmentation can be employed. The single-stranded DNA 
can be fragmented by partial degradation with a DNAse or 

25 partial depurination with acid. Labeling can be accomplished 
in a separate step, i.e., fluorophore-labeled nucleotides are 
incorporated before the fragmentation step or a DNA binding 
fluorophore, such as ethidium homodimer, is attached to the 
target after fragmentation. 

30 

Exemplary Chips 

a. Exon VI Chip 

To illustrate the value of the DNA chips of the present 
invention in such a method, a DNA chip was synthesized by the 
35 VLSIPS™ method to provide an array of overlapping probes which 
represent or tile across a 60 base region of exon 6 of the p53 
gene. To demonstrate the ability to detect substitution 
mutations in the target, twelve different single substitution 
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mutations (wild type and three different substitutions at each 
of three positions) were represented on the chip along with 
the wild type. Each of these mutations was represented by a 
series of twelve 12-mer oligonucleotide probes, which were 
5 complementary to the wild type target except at the one 

substituted base. Each of the twelve probes was complementary 
to a different region of the target and contained the mutated 
base at a different position, e.g., if the substitution was at 
base 32, the set of probes would be complementary — with the 
10 exception of base 32 — to regions of the target 21-32, 22-33, 
and 32-43). This enabled investigation of the effect of the 
substitution position within the probe. The alignment of some 
of the probes with a 12-mer model target nucleic acid is shown 
in Figure 27. 

15 To demonstrate the effect of probe length, an additional 

series of ten 10-mer probes was included for each mutation 
(see Figure 28) . In the vicinity of the substituted 
positions, the wild-type sequence was represented by every 
possible overlapping 12-mer and 10-mer probe. To simplify 

20 comparisons, the probes corresponding to each varied position 
were arranged on the chip in the rectangular regions with the 
following structure: each row of cells represents one 
substitution, with the top row representing the wild type. 
Each column contains probes complementary to the same region 

25 of the target, with probes complementary to the 3 '-end of the 
target on the left and probes complementary to the 5' -end of 
the target on the right. The difference between two adjacent 
columns is a single base shift in the positioning of the 
probes. Whenever possible, the series of 10-mer probes were 

3 0 placed in four rows immediately underneath and aligned with 
the 4 rows of 12-mer probes for the same mutation. 

To provide model targets, 5' f luoresceinated 12-mers 
containing all possible substitutions in the first position of 
codon 192 were synthesized (see the starred position in the 

35 target in Figure 27). Solutions containing 10 nM target DNA 
in 6X SSPE, 0.25% Triton X-100 were hybridized to the chip at 
room temperature for several hours. While target nucleic was 
hybridized to the chip, the fluorophores on the chip were 
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excited by light from an argon laser, and the chip was scanned 
with an autofocusing confocal microscope. The emitted signals 
were processed by a PC to produce an image using image 
analysis software. By 1 to 3 hours, the signal had reached a 
5 plateau; to remove the hybridized target and allow 

hybridization to another target, the chip was stripped with 
60% formamide, 2 X SSPE at 17 *C for 5 minutes. The washing 
buffer and temperature can vary, but the buffer typically 
contains 2-to-3X SSPE, 10-to-60% formamide (one can use 
10 multiple washes, increasing the formamide concentration by 10% 
each wash, and scanning between washes to determine when the 
wash is complete) , and optionally a small percentage of Triton 
X-100, and the temperature is typically in the range of 
15-to-18*C 

i 

15 Very distinct patterns were observed after hybridization 

with targets with 1 base substitutions and visualization with 
a confocal microscope and software analysis, as shown in 
Figure 29. In general, the probes which form perfect matches 
with the target retain the highest signal. For example, in 

20 the first image, the 12-mer probes that form perfect matches 
with the wild-type (WT) target are in the first row (top) . 
The 12-mer probes with single base mismatches are located in 
the second, third, and fourth rows and have much lower 
signals. The data is also depicted graphically in Figure 30. 

25 On each graph, the X ordinate is the position of the probe in 
its row on the chip, and the Y ordinate is the signal at that 
probe site after hybridization. When a target with a 
different one base substitution is hybridized the 
complementary set of probes has the highest signal (see 

30 pictures 2, 3, and 4 in Figure 29 and graphs 2, 3, and 4 in 
Figure 30) . In each case, the probe set with no mismatches 
with the target has the highest signals. Within a 12-mer 
probe set, the signal was highest at position 6 or 7. The 
graphs show that the signal difference between 12-mer probes 

3 5 at the same X ordinate tended to be greatest at positions 5 
and 8 when the target and the complementary probes formed 10 
base pairs and 11 base pairs, respectively. Because tumors 
often have both WT and mutant p53 genes, mixed target 
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populations were also hybridized to the chip, as shown in 
Figure 31. When the hybridization solution consisted of a 1:1 
mixture of WT 12-mer and a 12-mer with a substitution in 
position 7 of the target, the sets of probes that were 
5 perfectly matched to both targets showed higher signals than 
the other probe sets. 

The hybridization efficiency of a 10-mer probe array as 
compared to a 12-mer probe array was also compared. The 
10-mer and 12-mer probe arrays gave comparable signals (see 

10 graphs 1-4 in Figure 3 0 and graphs 1-4 in Figure 32) . 

However, the 10-mer probe sets, which are in rows 5-8 (see 
images in Figure 29) , seemed to be better in this model system 
than the 12-mer probe sets at resolving one target from 
another, consistent with the expectation that one base 

15 mismatches are more destabilizing for 10-mers than 12-mers. 
Hybridization results within probe sets perfectly matched to 
target also followed the expectation that, the more matches 
the individual probe formed with the target, the higher the 
signal. However, duplexes with two 3 1 dangles (see Figure 30, 

20 position 6 in graphs 1-4) have about as much signal as the 

probes which are matched along their entire length (see Figure 
30, position 7, in graphs 1-4). 

This illustrative model system shows that 12-mer targets 
that differ by one base substitutions can be readily 

25 distinguished from one another by the novel probe array 

provided by the invention and that resolution of the different 
12-mer targets was somewhat better with the 10-mer probe sets 
than with the 12-mer probe sets. 
b. Exon V Chip 

30 To analyze DNA from exon 5 of the p53 tumor suppressor 

gene, a set of overlapping 17-mer probes was synthesized on a 
chip. The probes for the WT allele were synthesized so as to 
tile across the entire exon with single base overlaps between 
probes. For each WT probe, a sets of 4 additional probes, one 

35 for each possible base substitution at position 7, were 

synthesized and placed in a column relative to the WT probe. 
Exon 5 DNA was amplified by PCR with primers flanking the 
exon. One of the primers was labeled with fluorescein; the 
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other primer was labeled with biotin. After amplification, 

the biotinylated strand was removed by binding to streptavidin 

beads. The f luoresceinated strand was used in hybridization. 

5 About 1/3 of the amplified, single-stranded nucleic acid 

was hybridized overnight in 5 X SSPE at 60 *C to the probe chip 
(under a cover slip) . After washing with 6 X SSPE, the chip 
was scanned using confocal microscopy. Figure 33 shows an 
image of the p53 chip hybridized to the target DNA. Analysis 

10 of the intensity data showed that 93.5% of the 184 bases of 
exon 5 were called in agreement with the WT sequence (see 
Buchman et al. . 1988, Gene 70 : 245 -252 , incorporated herein by 
reference) . The miscalled bases were from positions where 
probe signal intensities were tied (1.6%) and where non-WT 

15 probes had the highest signal intensity (4.9%). Figure 34 
illustrates how the actual sequence was read. Gaps in the 
sequence of letters in the WT rows correspond to control 
probes or sites. Positions at which bases are miscalled are 
represented by letters in italic type in cells corresponding 

20 to probes in which the WT bases have been substituted by other 
bases. 

As the diagram indicates, the miscalled bases are from 
the low intensity areas of the image, which may be due to 
secondary structure in the target or probes preventing 

25 intermolecular hybridization. To diminish the effects due to 
secondary structure, one can employ shorter targets (i.e., by 
target fragmentation) or use more stringent hybridization 
conditions. In addition, the use of a set of probes 
synthesized by tiling across the other strand of a duplex 

30 target can also provide sequence information buried in 
secondary structure in the other strand. It should be 
appreciated, however, that the pattern of low intensity areas 
that forms as a result of secondary structure in the target 
itself provides a means to identify that a specific target 

35 sequence is present in a sample. Other factors that may 

contribute to lower signal intensities include differences in 
probe densities and hybridization stabilities. 
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These results demonstrate the advantages provided by the 
DNA chips of the invention to genetic analysis. As another 
example, heterozygous mutations are currently sequenced by an 
arduous process involving cloning and repurif ication of DNA. 
5 The cloning step is required , because the gel sequencing 
systems are poor at resolving even a 1:1 mixture of DNA. 
First, the target DNA is amplified by PCR with primers 
allowing easy iigation into a vector, which is taken up by 
transformation of E. coli, which in turn must be cultured, 

10 typically on plates overnight. After growth of the bacteria, 
DNA is purified in a procedure that typically takes about 2 
hours; then, the sequencing reactions are performed, which 
takes at least another hour, and the samples are run on the 
gel for several hours, the duration depending on the length of 

15 the fragment to be sequenced. By contrast, the present 
invention provides direct analysis of the PCR amplified 
material after brief transcription and fragmentation steps, 
saving days of time and labor. 

20 D. Mitochondrial Genome Chips 

A human cell may have several hundred mitochondria, each 
with more than one copy of mtDNA. There is strand asymmetry 
in the base compositions, with one strand (Heavy) being 
relatively G rich, and the other strand (Light) being C rich. 

25 The L strand is 30.9% A, 31.2% C, 13.1% G, and 24.7% T. Human 
mtDNA is information-rich, encoding some 22 tRNAs, 12S and 16S 
rRNAs, and 13 polypeptides involved in oxidative 
phosphorylation. No introns have been detected. RNAs are 
processed by cleavage at tRNA sequences, and polyadenylated 

30 pos*transcriptionally. In some transcripts, polyadenylation 
also creates the stop codon, illustrating the parsimony of 
coding. In many individuals, mtDNA can be treated as haploid. 
However, some individuals are heteroplasmic (have more than 
one mtDNA sequence) , and the degree of heteroplasmy can vary 

35 from tissue to tissue. Also, the rate of replication of 

mtDNAs can differ and together with random segregation during 
cell division, can lead to changes in heteroplasmy over time. 
The human mitochondrial genome is 16,569 nucleotides 
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long. The sequence of the L-strand is numbered arbitrarily 
from the MboI-5/7 boundary in the D-loop region. The complete 
sequence of the human mitochondrial genome has been published. 
See Anderson et al., Nature 290, 457-465 (1981). 
5 Mitochondrial DNA is maternally inherited, and has a mutation 
rate estimated to be tenfold higher than single copy nuclear 
DNA (Brown et al., Proc . Natl. Acad. Sci . USA 76, 1967-1971 
(1979)). Human mtDNAs differ, on average, by about 70 base 
substitutions (Wallace, Ann. Rev. Biochem. 61, 1175-1212 

10 (1992)). Over 80% of substitutions are transitions (i.e., 
pyrimidine-pyrimidine or purine-purine) . 

Analysis of mitochondrial DNA serves several purposes. 
Detection of mutations in the mitochondrial genome allows 
diagnosis of a number of diseases. The mitochondrial genome 

15 has been identified as the locus of several mutations 

associated with human diseases. Some of the mutations result 
in stop codons in structural genes. Such mutations have been 
mapped and associated with diseases, such as Leber's 
hereditary optic neuropathy, neurogenic muscular weakness, 

20 ataxia and retinitis pigmentosa. Other mutations (nucleotide 
substitutions) occur in tRNA coding sequences, and presumably 
cause conformational defects in transcribed tRNA molecules. 
Such mutations have also been mapped and associated with 
diseases such as Myoclonic Epilepsy and Ragged Red Fiber 

25 Disease. Another type of mutation commonly found is deletions 
and/or insertions. Some deletions span segments of several 
kb. Again, such mutations have been mapped and associated 
with diseases , for example, ocular myopathy and Person 
Syndrome. See Wallace, Ann. Rev. Biochem. 61-1175-1212 (1992) 

3 0 (incorporated by reference in its entirety for all purposes) . 
Early detection of such diseases allows metabolic or genetic 
therapy to be administered before irretrievable damage has 
occurred. Id. Analysis of mitochondrial DNA is also 
important for forensic screening. Because the mitochondrial 

35 genome is a locus of high variability between individuals, 

sequencing a substantial length of mitochondrial DNA provides 
a fingerprint that is highly specific to an individual. 
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Analysis of mitochondrial DNA is also important for 
evolutionary and epidemiological studies. 

The reference sequence can be an entire mitochondrial 
genome or any fragment thereof. For forensic and 
5 epidemiological studies, the reference sequence is often all 
or part of the D-loop region in which variability between 
individuals is greatest (e.g., from 16024-16401 and 29-408). 
For detection of mutations, analysis of the entire genome is 
useful as a reference sequence, but shorter segments including 

10 the sites of known mutations, and about 1-20 flanking bases 
are also useful. Some chips have probes tiling paired 
reference sequences, representing wildtype and mutant versions 
of a sequence. Tiling a second reference sequence is 
particularly useful for detecting an insertion mutation 

15 occurring in 30-50% of ocular myopathy and Pearson syndrome 
patients, which consists of direct repeats of the sequence 
ACCTCCCTCACCA . Some chips include reference sequences from 
more than one mitochondrial genome. 

Mitochondrial reference sequences can be tiled using any 

20 of the strategies noted above. The block tiling strategy is 
particularly useful for analyzing short reference sequences or 
known mutations. Either the block strategy or the basic 
strategy is suitable for analyzing long reference sequences. 
In many of the tiling strategies, it is possible to use fewer 

25 probes compared with the number used in other chips without 
significant loss of sequence information. As noted above, 
most' point mutations in mitochondrial DNA are transitions, so 
for each wildtype nucleotide in a reference sequence, one of 
the three possible nucleotide substitutions is much more 

3 0 likely than the other two. Accordingly, in the basic tiling 
strategy, for example, a reference sequence can be tiled using 
only two probe sets. One probe sets comprises a plurality of 
probes, each probe having a segment exactly complementary to 
the reference sequence. The second probe set comprises a 

3 5 corresponding probe for each probe in the first set. However, 
a probe from the second probe set differs from the 
corresponding probe from the first probe set in an 
interrogation position, in which the probe from the second 
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probe set includes the transition of the nucleotide present in 
that position in the probe from the first probe set. 

Target mitochondrial DNA can be amplified, labelled and 
fragmented prior to hybridization using the same procedures as 
5 described for other chips. Use of at least two labelled 

nucleotides is desirable to achieve uniform labelling. Some 
exemplary primers are described below and other primers can be 
designed from the known sequence of mitochondrial DNA. 
Because mitochondrial DNA is present in multiple copies per 
10 cell, it can also be hybridized directly to a chip without 
prior amplification. 

Exemplary Chips 

The invention provides a DNA chip for analyzing sequences 

15 contained in a 1.3 kb fragment of human mitochondrial DNA from 
the "D-loop" region, the most polymorphic region of human 
mitochondrial DNA. One such chip comprises a set of 2 69 
overlapping oligonucleotide probes of varying length in the 
range of 9-14 nucleotides with varying overlaps arranged in 

20 "600 x 600 micron features or synthesis sites in an array 1 cm 
x 1 cm in size. The probes on the chip are shown in columnar 
form below. An illustrative mitochondrial DNA chip of the 
invention comprises the following probes (X, Y coordinates are 
shown, followed by the sequence; "DL3 11 represents the 3 '-end 

25 of the probe, which is covalently attached to the chip 
surface.) 

0 0 DL3AGTGGGGTATTT 1 1 DL3 GGTTGGTTTGGG 

1 0 DL3 GGGTATTTAGTT 2 1 DL3 TGGGGTTTCTAG 

2 0 DL3TTAGTTTATCCAA 3 1 DL3 GTTTCTAGTGGG 
30 3 0 DL3ATCCAAACCAGG 4 1 DL3AGTGGGGGGTGT 

4 0 DL3ACCAGGATCGGA 5 1 DL3 GGGGTGTCAAAT 

5 0 DL3 CGTGTGTGTGTGG 6 1 DL3 GTCAAATACATCG 

6 0 DL3 CGTGTGTGTGTGGC 7 1 DL3 ACATCGAATGGAG 

7 0 DL3 TCGTGTGTGTGTGG 8 1 DL3 CGAATGGAGGAG 
35 8 0 DL3 GTAGG ATGGGTC 9 1 DL3 GAGGAGTTTCGT 

9 0 DL3AGGATGGGTCGT 10 1 DL3 TTTCGTTATGTGA 

10 0 DL3 GATGGGTCGTGT 11 1 DL3ATGTGACTTTTAC 

11 0 DL3 TGGCGACGATTG 12 1 DL3 GACTTTTACAAAT 

12 0 DL3 GCGACGATTGGG 13 1 DL3AAATCTGCCCGA 
4 0 13 0 DL3 TGGGGGGGA 14 1 DL3AATCTGCCCGAG 

14 0 DL3GAGGGGGCG 15 1 DL3 CCCGAGTGTAGT 

15 0 DL3 GGAGGGGGCGA 16 1 DL3AGTGTAGTGGGG 

16 0 DL3 GAGGGGGCGA 0 2 DL3GGGAGGGTGAG 
0 1 DL3 GGCTTGGTTGG 1 2 DL3GGTGAGGGTATG 
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5 DL3ATTGTTAAACTTA 

5 DL3AAACTTACAGACG 

5 DL3ACAGACGTGTCG 

5 DL3 GTGTCGGTGAAA 

5 DL3 GTGAAAGGTGTGT 

5 DL3GGTGTGTCTGTAG 

5 DL3TGTGTCTGTAGTA 

5 DL3GTAGTATTGTTTT 

5 D L 3 AGT ATTGTTTTTT 

6 DL3 CCTCGTGGGATA 
6 DL3 TGGGATACAGCG 
6 DL3 GATACAGCGTCAT 
6 DL3 GCGTCATAGACAG 
6 DL3 AG ACAG AAACTAA 
6 DL3 CAGAAACTAAGGA 
6 DL3 TAAGGACGGAGT 
6 DL3 GACGGAGTAGGA 
6 DL3 GTAGGATAATAAA 
6 DL 3 T AATAAATAG CG 

6 DL3ATAGCGTAGGAT 

6 DL3TAGCGTAGGATG 

6 DL3 AGGATGCAAGTT 

6 DL3 ATGCAAGTTATAA 

6 DL3 GTTATAATGTCCG 

6 DL3ATGTCCGCTTGT 

6 DL3 TCCGCTTGTATG 

7 DL3 GTGAGTGCCCTC 
7 DL3 TGCCCTCGAGAG 
7 DL3 CCTCGAGAGGTA 
7 DL3AGAGGTACGTAA 
7 DL3ACGTAAACCATA 
7 DL3 ACCATAAAAGCAG 
7 DL3AAAGCAGACCC 

7 DL3 AGACCCCCCAT 

7 DL3 CCCCCATACGT 

7 DL3 CATACGTGCGCT 

7 DL3 GTGCGCTATCAG 

7 DL3 GCGCTATCAGTA 

7 DL3 TCAGTAACGCTC 

7 DL3 GTAACGCTCTGC 

7 DL3 CTCTGCGACCTC 

7 DL3 GACCTCGGCCT 

7 DL3 TCGGCCTCGTG 

8 DL3 GATGAAGTCCCAG 
8 DL3 AGTCCCAGTATTT 
8 DL3 GTATTTCGGATTT 
8 DL3 TCGGATTTATCG 
8 DL3 GATTTATCGGGT 

8 DL3 ATCGGGTGTGCA 

8 DL3 TGTGCAAGGGGA 

8 DL3 CAAGGGGAATTT 

8 DL3 GAATTTATTCTGTA 

8 DL3 TCTGTAGTGCTAC 

8 DL3 GTAGTGCTACCT 

8 DL3 GCTACCTAGTAG 

8 DL3 CTAGTAGTCCAGA 

8 DL3TCCAGATAGTGGG 
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10 


11 


DL3GTCATGTATCATGT 


7 


16 


DL3GGTTCCTGTTTA 

4^*j — / ww x x w w X w X X X *\ 




11 


11 


DL3 TCATGTATTTCGG 


8 


16 


DL3 CCTGTTTAHTPTP 

*-'XjjwwXwX X X r\ \j iLIU 


50 


12 


11 


DL3TATTTCGGTAAA 


9 


16 


DL3 TTAGTCTCTTTTT 




13 


11 


DL3 TTCGGTAAATGG 


10 


16 


DL3 CTTTTTCAGAAAT 




14 


11 


DL3 GTAAATGGCATGT 


11 


16 


DL3 AG AAATTGAGGTG 




15. 11 


DL3GCATGTAATCGTG 


12 


16 


D L 3 AAATTG AGGTGGT 




16 


11 


DL3 GTAATCGTGTAAT 


13 


16 


DL3 GGTGGTAATCGT 


55 


5 


12 


DL3 GGGAGGGGTAC 


14 


16 


DL3 TAATCGTGGGTT 




6 


12 


DL3 GGGTACGAATGT 


15 


16 


DL3 GTGGGTTTCGAT 




7 


12 


DL3ACGAATGTTCGTT 


16 


16 


DL3 GGTTTCGATTCT 
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No probes were present in positions X, Y=0, 12 to X, Y = 4 , 
12; X, Y = 0 , 13 to X, Y = 4, 13; X, Y = 0, 14 to X, Y = 4, 
14; X, Y = 0, 15 to X, Y = 4, 15; X, Y = 0, 16 to X, Y = 4, 
5 16; 

The length of each of the probes on the chip was variable to 
minimize differences in melting temperature and potential for 
cross-hybridization. Each position in the sequence was 
represented by at least one probe and most positions were 

10 represented by 2 or more probes. As noted above, the amount 
of overlap between the oligonucleotides varied from probe to 
probe. Figure 35 shows the human mitochondrial genome; H 0 H " 
is the H strand origin of replication, and arrows indicate the 
cloned unshaded sequence. 

15 DNA was prepared from hair roots of six human donors (mtl 

to mt6) and then amplified by PCR and cloned into M13 ; the 
resulting clones were sequenced using chain terminators to 
verify that the desired specific sequences were present. DNA 
from the sequenced M13 clones was amplified by PCR, 

20 transcribed In vitro, and labeled with f luorescein-UTP using 
T3 RNA polymerase. The 1.3 kb RNA transcripts were fragmented 
and hybridized to the chip. The results showed that each 
different individual had DNA that produced a unique 
hybridization fingerprint on the chip and that the differences 

25 in the observed patterns could be correlated with differences 
in the cloned genomic DNA sequence. The results also 
demonstrated that very long sequences of a target nucleic acid 
can be represented comprehensively as a specific set of 
overlapping oligonucleotides and that arrays of such probe 

30 sets can be usefully applied to genetic analysis. 

The sample nucleic acid was hybridized to the chip in a 
solution composed of 6 X SSPE, 0.1% Triton-X 100 for 60 
minutes at 15 °C. The chip was then scanned by confocal 
scanning fluorescence microscopy. The individual features on 

35 the chip were 588 x 588 microns, but the lower left 5x5 
square features in the array did not contain probes. To 
quantitate the data, pixel counts were measured within each 
synthesis site." Pixels represent 50 x 50 microns. The 
fluorescence intensity for each feature was scaled to a mean 
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determined from 27 bright features. After scanning, the chip 
was stripped and rehybridized; all six samples were hybridized 
to the same chip* Figure 3 6 shows the image observed from the 
mt4 sample on the DNA chip. Figure 3 7 shows the image 
5 observed from the mt5 sample on the DNA chip. Figure 3 8 shows 
the predicted difference image between the mt4 and mt5 samples 
on the DNA chip based on mismatches between the two samples 
and the reference sequence (see Anderson et al., supra). 
Figure 3 9 shows the actual difference image observed. 

10 The results show that, in almost all cases, mismatched 

probe/target hybrids resulted in lower fluorescence intensity 
than perfectly matched hybrids. Nonetheless, some probes 
detected mutations (or specific sequences) better than others, 
and in several cases, the differences were within noise 

15 levels. Improvements can be realized by increasing the amount 
of overlap between probes and hence overall probe density and, 
for duplex DNA targets, using a second set of probes, either 
on the same or a separate chip, corresponding to the second 
strand of the target. Figure 40, in sheets 1 and 2, shows a 

20 plot of normalized intensities across rows 10 and 11 of the 
array and a tabulation of the mutations detected. 

Figure 41 shows the discrimination between wild-type and 
mutant hybrids obtained with this chip. The median of the six 
normalized hybridization scores for each probe was taken. The 

25 graph plots the ratio of the median score to the normalized 
hybridization score versus mean counts. On this graph, a 
ratio of 1.6 and mean counts above 50 yield no false 
positives, and while it is clear that detection of some 
mutants can be improved, excellent discrimination is achieved, 

30 considering the small size of the array. Figure 42 
illustrates how the identity of the base mismatch may 
influence the ability to discriminate mutant and wild-type 
sequences more than the position of the mismatch within an 
oligonucleotide probe. The mismatch position is expressed as 

35 % of probe length from the 3 ! -end. The base change is 

indicated on the graph. These results show that the DNA chip 
increases the capacity of the standard reverse dot blot format 
by orders of magnitude, extending the power of that approach 
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1 

many fold and that the methods of the invention are more 
efficient and easier to automate than gel-based methods of 
nucleic acid sequence and mutation analysis. 

To illustrate further these advantages, a second chip was 
5 prepared for analyzing a longer segment from human 

mitochondrial DNA (mtDNA) . The chip "tiles" through 648 
nucleotides of a reference sequence comprising human H strand 
mtDNA from positions 16280 to 356, and allows analysis of each 
nucleotide in the reference sequence. The probes in the array 

10 are 15 nucleotides in length, and each position in the target 
sequence is represented by a set of 4 probes (A, C, G, T 
substitutions) , which differed from one another at position 7 
from the 3* -end. The array consists of 13 blocks of 4 x 50 
probes: each block scans through 50 nucleotides of contiguous 

15 mtDNA sequence. The blocks are separated by blank rows. The 
4 corner columns contain control probes; there are a total of 
2600 probes in a 1.28 cm x 1.28 cm square area (feature), and 
each area is 256 x 197 microns. 

Target RNA was prepared as above. The RNA was fragmented 

20 and hybridized to the oligonucleotide array in a solution 

composed of 6X SSPE, 0.1% Triton X-100 for 60 minutes at 18 °C. 
Unhybridized material was washed away with buffer, and the 
chip was scanned at 25 micron pixel resolution. 

Figure 43 provides a 5 1 to 3 • sequence listing of one 

25 target corresponding to the probes on the chip. X is a 

control probe. Positions that differ in the target (i.e., are 
mismatched with the probe at the designated site) are in bold. 
Figure 44 shows the fluorescence image produced by scanning 
the chip when hybridized to this sample. About 95% of the 

3 0 sequence could be read correctly from only one strand of the 
original duplex target nucleic acid. Although some probes did 
not provide excellent discrimination and some probes did not 
appear to hybridize to the target efficiently, excellent 
results were achieved. The target sequence differed from the 

35 probe set at six positions: 4 transitions and 2 insertions. 
All 4 transitions were detected, and specific probes could 
readily be incorporated into the array to detect insertions or 
deletions. Figure 4 5 illustrates the detection of 4 
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transitions in the target sequence relative to the wild-type 
probes on the chip. 

A further chip was constructed comprising probes tiling 
across the entire D-loop region (1.3 kb) of mt DNA sequences 
5 from two humans. The probes were tiled in rows of four using 
the basic tiling strategy. The probes were overlapping 15 
mers having an interrogation position 7 nucleotides from the 
3' end. The complete group of probes tiled on the reference 
sequence from the first individual, designated mtl, occupied 

10 the upper half of the chip. The lower half of the chip 

contained a similar arrangement based on a second clone, mt2 . 
The probes were synthesized in a 1.28 x 1.28 cm area, which 
contained a matrix of 115 x 120 cells. The chip contained a 
total of 10,488 mtDNA probes. 

15 Six samples of target DNA was extracted form hair roots 

from six individuals. The 1.3 kb region spanning positions 
1593 5 to 667 of human mtDNA was PCR amplified, cloned in 
bacteriophage M13 and sequenced by conventional methods. The 
1.3 kb region was reamplified from the phage clone using 

20 primers, L15935-T3, 

5 1 CTCGGAATTAACCCTCACTAAAGGAAACCTTTTTCCAAGGA and H667-T7, 
5 1 TAATACGACTCACTATAGGGAGAGGCTAGGACCAAACCTATT tagged with T3 
and T7 RNA polymerase promoter sequences. Labelled RNA was 
generated by in vitro transcription using T3 RNA polymerase 

25 and f luoresceinated nucleotides, fragmented, and hybridized to 
the mtDNA control region resequencing chip at room temperature 
for 60 min, in 6xSSPE + 0.05% triton X-100. Six washes were 
carried out at room temperature, using 6xSSPE + 0.005% triton 
X-100, and the chip was read. Signal intensities varied 

3 0 considerably over the chip, but the large dynamic range of the 
detection system allowed accurate quantitation of intensities 
over several orders of magnitude. Even relatively low signal 
intensities yielded accurate results. 

Five different clones (mtl-5) were hybridized, each to a 

35 separate chip. The reference sequence was also hybridized for 
comparative purposes. Mean counts per probe cell were 
determined, and used by automated basecalling software to read 
the sequence. The accuracy of sequence read from the chip is 
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summarized as follows. Combining the data from the five 
targets analyzed, the chip read a total of 6310 nucleotides. 
Of these nucleotides in the target sequences, 55 were 
different from the reference sequence (as judged by 
5 conventional sequencing) (41 of these 55 nucleotides were both 
detected and read correctly from the chip) . 6 of 55 
nucleotides were detected as being ambiguous but their 
identity could not be read. 2 of 55 nucleotides were detected 
as mutations, but their identity was miscalled. 6 of 55 
10 nucleotides were incorrectly called as wildtype. Of the 6255 
nucleotides in the target sequence that were identical to the 
reference sequence, only 3 6 (0.57%) were miscalled or scored 
as ambiguous. 

A further chip was constructed comprising probes tiling 

15 across a reference sequence comprising an entire mitochondrial 
genome. In this chip, a block tiling strategy was used. Each 
block was designed to analyze seven nucleotides from a target 
sequence. Each block consisted of four probe sets, the probe 
sets each having seven probes. A block was laid down on the 

20 chip in seven columns of four probes. The upper probe was the 
same in each column, this being a probe exactly complementary 
to a subsequence of the reference sequence. The three other 
probes in each column were identical to the upper probe except 
in an interrogation position, which was occupied by a 

25 different base in each of the four probes in the column. The 
interrogation position shifted by one position between 
successive columns. Thus, except for the seven interrogation 
positions, one in each of the columns of probes, all probes 
occupying a block were identical. The array comprised many 

30 such blocks, each tiled to successive subsequences of the 

mitochondrial DNA reference sequence. In all, the chip tiled 
15,569 nucleotides of reference sequence with double tiling at 
42 positions. 66,276 probes occupied an array of 304 x 315 
cells, each cell having an area of 42 x 41 microns. 

35 The chip was hybridized to the same target sequences as 

described for the D-loop region, except that hybridization was 
at 15 °C for 2 hr. The chip was scanned at 5 micron resolution 
to give an image with approximately 64 pixels per cell. For 
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blocks of probes tiling across the D-loop region, a sequence- 
specific hybridization pattern was obtained. For other 
blocks, only background hybridization was observed. 

These results illustrate that longer sequences can be 
5 read using the DNA chips and methods of the invention, as 
compared to conventional sequencing methods, where reading 
length is limited by the resolution of gel electrophoresis. 
Hybridization and signal detection require less than an hour 
and can be readily shortened by appropriate choice of buffers, 
10 temperatures, probes, and reagents. 

III. MODES OF PRACTICING THE INVENTION 
A. VLSI PS™ Technology 

As noted above, the VLSIPS™ technology is described in a 

15 number of patent publications and is preferred for making the 
oligonucleotide arrays of the invention. A brief description 
of how this technology can be used to make and screen DNA 
chips is provided in this Example and the accompanying 
Figures. In the VLSIPS™ method, light is shone through a mask 

20 to activate functional (for oligonucleotides, typically an 
-OH) groups protected with a photoremovable protecting group 
on a surface of a solid support. After light activation, a 
nucleoside building block, itself protected with a 
photoremovable protecting group (at the 5' -OH), is coupled to 

25 the activated areas of the support. The process can be 
repeated, using different masks or mask orientations and 
building blocks, to prepare very dense arrays of many 
different oligonucleotide probes. The process is illustrated 
in Figure 46; Figure 47 illustrates how the process can be 

3 0 usdd to prepare "nucleoside combinatorials" or 

oligonucleotides synthesized by coupling all four nucleosides 
to form dimers, trimers and so forth. 

New methods for the combinatorial chemical synthesis of 
peptide, polycarbamate, and oligonucleotide arrays have 

35 recently been reported (see Fodor et al., 1991, Science 251: 
767-773; Cho et al., 1993, Science 261: 1303-1305; and 
Southern et al., 1992, Genomics 13: 1008-10017, each of which 
is incorporated herein by reference) . These arrays, or 
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biological chips (see Fodor et al., 1993, Nature 3 64: 555-556, 
incorporated herein by reference) , harbor specific chemical 
compounds at precise locations in a high-density, information 
rich format, and are a powerful tool for the study of 
5 biological recognition processes. A particularly exciting 
application of the array technology is in the field of DNA 
sequence analysis. The hybridization pattern of a DNA target 
to an array of shorter oligonucleotide probes is used to gain 
primary structure information of the DNA target. This format 

10 has important applications in sequencing by hybridization, DNA 
diagnostics and in elucidating the thermodynamic parameters 
affecting nucleic acid recognition. . 

Conventional DNA sequencing technology is a laborious 
procedure requiring electrophoretic size separation of labeled 

15 DNA fragments. An alternative approach, termed Sequencing By 
Hybridization (SBH) , has been proposed (Lysov et al., 1988, 
Dokl. Akad. Nauk SSSR 303:1508-1511; Bains et al., 1988, J. 
Theor. Biol. 135:303-307; and Drmanac et al., 1989, Genomics 
4:114-128, incorporated herein by reference). This method 

20 uses a set of short oligonucleotide probes of defined sequence 
to search for complementary sequences on a longer target 
strand of DNA. The hybridization pattern is used to 
reconstruct the target DNA sequence. It is envisioned that 
hybridization analysis of large numbers of probes can be used 

25 to sequence long stretches of DNA. In immediate applications 
of this hybridization methodology, a small number of probes 
can be used to interrogate local DNA sequence. 

The strategy of SBH can be illustrated by the following 
example. A 12-mer target DNA sequence, AGCCTAGCTGAA, is mixed 

3 0 with a complete set of octanucleotide probes. If only perfect 
complementarity is considered, five of the 65,536 octamer 
probes -TCGGATCG, CGGATCGA, GGATCGAC, GATCGACT, and ATCGACTT 
will hybridize to the target. Alignment of the overlapping 
sequences from the hybridizing probes reconstructs the 

3 5 complement of the original 12-mer target: 

TCGGATCG 
CGGATCGA 
GGATCGAC 
40 GATCGACT 
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ATCGACTT 
TCGGATCGACTT 

Hybridization methodology can be carried out by attaching 
5 target DNA to a surface. The target is interrogated with a 
set of oligonucleotide probes, one at a time (see Strezoska et 
al., 1991, Proc. Natl. Acad. Sci . USA 88:10089-10093, and 
Drmanac et al., 1993, Science 260:1649-1652, each of which is 
incorporated herein by reference) . This approach can be 

10 implemented with well established methods of immobilization 
and hybridization detection, but involves a large number of 
manipulations. For example, to probe a sequence utilizing a 
full set of octanucleotides, tens of thousands of 
hybridization reactions must be performed. Alternatively, SBH 

15 can be carried out by attaching probes to a surface in an 

array format where the identity of the probes at each site is 
known. The target DNA is then added to the array of probes. 
The hybridization pattern determined in a single experiment 
directly reveals the identity of all complementary probes. 

20 As noted above, a preferred method of oligonucleotide 

probe array synthesis involves the use of light to direct the 
synthesis of oligonucleotide probes in high-density, 
miniaturized arrays. Photolabile 5 '-protected 
N-acyl-deoxynucleoside phosphoramidites, surface linker 

25 chemistry, and versatile combinatorial synthesis strategies 
have been developed for this technology. Matrices of 
spatially-defined oligonucleotide probes have been generated, 
and the ability to use these arrays to identify complementary 
sequences has been demonstrated by hybridizing fluorescent 

30 labeled oligonucleotides to the DNA chips produced by the 

methods. The hybridization pattern demonstrates a high degree 
of base specificity and reveals the sequence of 
oligonucleotide targets. 

The basic strategy for light-directed oligonucleotide 

35 synthesis (1) is outlined in Fig. 46. The surface of a solid 
support modified with photolabile protecting groups (X) is 
illuminated through a photolithographic mask, yielding 
reactive hydroxyl groups in the illuminated regions. A 
3 f -0-phosphoramidite activated deoxynucleoside (protected at 
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the 5' -hydroxy 1 with a photolabile group) is then presented to 
the surface and coupling occurs at sites that were exposed to 
light. Following capping, and oxidation, the substrate is 
rinsed and the surface illuminated through a second mask, to 
5 expose additional hydroxyl groups for coupling. A second 
5 1 -protected, 3 1 -O-phosphoramidite activated deoxynucleoside 
is presented to the surface. The selective photodeprotection 
and coupling cycles are repeated until the desired set of 
products is obtained. 

10 Light directed chemical synthesis lends itself to highly 

efficient synthesis strategies which will generate a maximum 
number of compounds in a minimum number of chemical steps. 
For example, the complete set of 4 n polynucleotides (length 
n) , or any subset of this set can be produced in only 4 x n 

15 chemical steps. See Fig. 47. The patterns of illumination 
and the order of chemical reactants ultimately define the 
products and their locations. Because photolithography is 
used, the process can be miniaturized to generate high-density 
arrays of oligonucleotide probes. For an example of the 

20 nomenclature useful for describing such arrays, an array 
containing all possible octanucleotides of dA and dT is 
written as (A+T) 8 . Expansion of this polynomial reveals the 
identity of all 256 octanucleotide probes from AAAAAAAA to 
TTTTTTTT. A DNA array composed of complete sets of 

25 dinucleotides is referred to as having a complexity of 2. The 
array given by (A+T+C+G)8 is the full 65,53 6 octanucleotide 
array of complexity four. Computer-aided methods of laying 
down predesigned arrays of probes using VLSIPS™ technology are 
described in commonly-assigned co-pending application USSN 

30 08/249,188, filed May 24, 1994 (incorporated by reference in 
its entirety for all purposes) . 

To carry out hybridization of DNA targets to the probe 
arrays, the arrays are mounted in a thermostatically 
controlled hybridization chamber. Fluorescein labeled DNA 

35 targets are injected into the chamber and hybridization is 
allowed to proceed for 5 min to 24 hr. The surface of the 
matrix is scanned in an epif luorescence microscope (Zeiss 
Axioscop 20) equipped with photon counting electronics using 
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50 -' 100 fxVi of 488 nm excitation from an Argon ion laser 
(Spectra Physics Model 2020) . Measurements may be made with 
the target solution in contact with the probe matrix or after 
washing. Photon counts are stored and image files are 
5 presented after conversion to an eight bit image format. See 
Fig. 51. 

When hybridizing a DNA target to an oligonucleotide 
array, N = Lt-(Lp-l) complementary hybrids are expected, where 
N is the number of hybrids, Lt is the length of the DNA 

10 target, and Lp is the length of the oligonucleotide probes on 
the array. For example, for an ll-mer target hybridized to an 
octanucleotide array, N = 4. Hybridizations with mismatches 
at positions that are 2 to 3 residues from either end of the 
probes will generate detectable signals. Modifying the above 

15 expression for N, one arrives at a relationship estimating the 
number of detectable hybridizations (Nd) for a DNA target of 
length Lt and an array of complexity C. Assuming an average 
of 5 positions giving signals above background: 
Nd = (1 + 5(C-1) ) [Lt-(Lp-l) ] . 

20 Arrays of oligonucleotides can be efficiently generated 

by light-directed synthesis and can be used to determine the 
identity of DNA target sequences. Because combinatorial 
strategies are used, the number of compounds increases 
exponentially while the number of chemical coupling cycles 

25 increases only linearly. For example, synthesizing the 

complete set of 4 8 (65,536) octanucleotides will add only four 
hours to the synthesis for the 16 additional cycles. 
Furthermore, combinatorial synthesis strategies can be 
implemented to generate arrays of any desired composition. 

30 For example, because the entire set of dodecamers (4 12 ) can be 
produced in 48 photolysis and coupling cycles (b n compounds 
requires b x n cycles) , any subset of the dodecamers 
(including any subset of shorter oligonucleotides) can be 
constructed with the correct lithographic mask design in 48 or 

35 fewer chemical coupling steps. In addition, the number of 
compounds in an array is limited only by the density of 
synthesis sites and the overall array size. Recent 
experiments have demonstrated hybridization to probes 
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synthesized in 25 fim sites. At this resolution, the entire 
set of 65,53 6 octanucleotides can be placed in an array- 
measuring 0.64 cm square, and the set of 1,048,576 
dodecanucleotides requires only a 2.56 cm array. 

Genome sequencing projects will ultimately be limited by 
DNA sequencing technologies. Current sequencing methodologies 
are highly reliant on complex procedures and require 
substantial manual effort. Sequencing by hybridization has 
the potential for transforming many of the manual efforts into 
more efficient and automated formats. Light-directed 
synthesis is an efficient means for large scale production of 
miniaturized arrays for SBH. The oligonucleotide arrays are 
not limited to primary sequencing applications. Because 
single base changes cause multiple changes in the 
hybridization pattern, the oligonucleotide arrays provide a 
powerful means to check the accuracy of previously elucidated 
DNA sequence, or to scan for changes within a sequence. in 
the case of octanucleotides, a single base change in the 
target DNA results in the loss of eight complements, and 
generates eight new complements. Matching of hybridization 
patterns may be useful in resolving sequencing ambiguities 
from standard gel techniques, or for rapidly detecting DNA 
mutational events. The potentially very high information 
content of light-directed oligonucleotide arrays will change 
genetic diagnostic testing. Sequence comparisons of hundreds 
to thousands of different genes will be assayed simultaneously 
instead of the current one, or few at a time format. Custom 
arrays can also be constructed to contain genetic markers for 
the rapid identification of a wide variety of pathogenic 
organisms . 

Oligonucleotide arrays can also be applied to study the 
sequence specificity of RNA or protein-DNA interactions. 
Experiments can be designed to elucidate specificity rules of 
non Watson-Crick oligonucleotide structures or to investigate 
the use of novel synthetic nucleoside analogs for antisense or 
triple helix applications. Suitably protected RNA monomers 
may be employed for RNA synthesis. The oligonucleotide arrays 
should find broad application deducing the thermodynamic and 
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kinetic rules governing formation and stability of 
oligonucleotide complexes . 

Other than the use of photoremovable protecting groups, 
the nucleoside coupling chemistry is very similar to that used 
5 routinely today for oligonucleotide synthesis. Fig. 48 shows 
the deprotection, coupling, and oxidation steps of a solid 
phase DNA synthesis method. Fig. 4 9 shows an illustrative 
synthesis route for the nucleoside building blocks used in the 
method. Fig. 50 shows a preferred photoremovable protecting 

10 group, MeNPOC, and how to prepare the group in active form. 
The procedures described below show how to prepare these 
reagents. The nucleoside building blocks are 
5 1 -MeNPOC-THYMIDINE-3 ' -OCEP ; 5 1 -MeNPOC-N 4 -t-BUTYL 
PHEN0XYACETYL-DE0XYCYTIDINE-3 1 -OCEP ; 5 1 -MeNPOC-N 4 -t-BUTYL 

15 PHENOXYACETYL-DEOXYGUANOS INE- 3 1 -OCEP ; and 5 1 -MeNPOC-N 4 -t-BUTYL 
PHEN0XYACETYL-DE0XYADEN0SINE-3 1 -OCEP 



1. Preparation of 4 . 5-methylenedioxv-2-nitroacetophenone 




A solution of 50 g (0.305 mole) 3 , 4-methylenedioxy- 
acetophenone (Aldrich) in 200 mL glacial acetic acid was added 
dropwise over 30 minutes to 700 mL of cold (2-4 °C) 70% HN0 3 

25 with stirring (NOTE: . the reaction will overheat without 

external cooling from an ice bath, which can be dangerous and 
. lead to side products) . At temperatures below 0°C, however, 
the reaction can be sluggish. A temperature of 3-5°C seems to 
be optimal). The mixture was left stirring for another 60 

30 minutes at 3-5°C, and then allowed to approach ambient 

temperature. Analysis by TLC (25% EtOAc in hexane) indicated 
complete conversion of the starting material within 1-2 hr. 
When the reaction was complete, the mixture was poured into ~3 
liters of crushed ice, and the resulting yellow solid was 
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filtered off, washed with water and then suction-dried. Yield 
"53 g (84%) , used without further purification. 



2. Preparation of 1- ( 4 , 5~Methvlenedioxv-2-nitrophenyl) 
5 ethanol 




N0 2 



OH 



NaBH 4 
EtOH 



Sodium borohydride (lOg; 0.27 mol) was added slowly to a cold, 

10 stirring suspension of 53g (0.25 mol) of 

4 , 5-methylenedioxy-2-nitroacetophenone in 4 00 mL methanol. 
The temperature was kept below 10 °C by slow addition of the 
NaBH 4 and external cooling with an ice bath. Stirring was 
continued at ambient temperature for another two hours, at 

15 which time TLC (CH 2 C1 2 ) indicated complete conversion of the 
ketone. The mixture was poured into one liter of ice-water 
and the resulting suspension was neutralized with ammonium 
chloride and then extracted three times with 400 mL CH 2 C1 2 or 
EtOAc (the product can be collected by filtration and washed 

2 0 at this point, but it is somewhat soluble in water and this 
results in a yield of only "60%) . The combined organic 
extracts were washed with brine, then dried with MgS0 4 and 
evaporated. The crude product was purified from the main 
byproduct by dissolving it in a minimum volume of CH 2 C1 2 or 

25 THF(~17 5 ml) and then precipitating it by slowly adding hexane 
(1000 ml) while stirring (yield 51g; 80% overall) . It can 
also be recrystallized (e.g., toluene-hexane) , but this 
reduces the yield. 
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3. Preparation of l-f4,5- methvlenedioxv-2-nitrophenvl ) ethyl 
chlorof ormate (MeNPQC-Cl ) 



0 




5 

Phosgene (500 mL of 20% w/v in toluene from Fluka: 965 mmole; 
4 eq.) was added slowly to a cold, stirring solution of 50g 
(237 mmole; 1 eg.) of 1- (4 , 5-methylenedioxy-2-nitrophenyl) 
ethanol in 4 00 mL dry THF. The solution was stirred overnight 

10 at ambient temperature at which point TLC (2 0% Et 2 0/hexane) 
indicated >95% conversion. The mixture was evaporated (an 
oil-less pump with downstream aqueous NaOH trap is recommended 
to remove the excess phosgene) to afford a viscous brown oil. 
Purification was effected by flash chromatography on a short 

15 (9 x 13 cm) column of silica gel eluted with 20% Et 2 0/hexane. 
Typically 55g (85%) of the solid yellow MeNPOC-Cl is obtained 
by this procedure. The crude material has also been 
recrystallized in 2-3 crops from 1:1 ether /hexane. On this 
scale, "100ml is used for the first crop, with a few percent 

20 THF added to aid dissolution, and then cooling overnight at 
-20°C (this procedure has not been optimized) . The product 
should be stored desiccated at -20°C. 
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4. Synthesis of 5 1 - Menpoc-2 * -deoxvnucleoside-3 * - 
(N,N-diisopropvl 2-cvanoethvl phosphoramidites 
fa . ) 5 1 -MeNPOC-Nucleosides 




Base= Thymidine (T) ; N-4-isobutyryl 2 1 -DEOXYcytidine (ibu-dC) ; 
N-2-PHENOXYACETYL 2 1 DEOXYGUANOSINE (PAC-dG) ; and 
10 N-6-PHENOXYACETYL 2 ' DEOXY ADENOSINE (PAC-dA) 

All four of the 5'-MeNPOC nucleosides were prepared from the 
base-protected 2 1 -deoxynucleosides by the following procedure. 
The protected 2 1 -deoxynucleoside (90 itunole) was dried by 

15 co-evaporating twice with 250 mL anhydrous pyridine. The 

nucleoside was then dissolved in 300 mL anhydrous pyridine (or 
1:1 pyridine/DMF, for the dG PAC nucleoside) under argon and 
cooled to "2°C in an ice bath, A solution of 24. 6g (90 
mmole) MeNPOC-Cl in 100 mL dry THF was then added with 

20 stirring over 3 0 minutes. The ice bath was removed, and the 
solution allowed to stir overnight at room temperature (TLC: 
5-10% MeOH in CH 2 C1 2; two diastereomers) m After evaporating 
the solvents under vacuum, the crude material was taken up in 
250 mL ethyl acetate and extracted with saturated aqueous 

25 NaHC0 3 and brine. The organic phase was then dried over 

Na 2 S0 4 ^ filtered and evaporated to obtain a yellow foam. The 
crude products were finally purified by flash chromatography 
(9 x 30 cm silica gel column eluted with a stepped gradient of 
2% - 6% MeOH in CH 2 C1 2 ) . Yields of the purified diastereomeric 

30 mixtures are in the range of 65-75%. 
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( b. ) 5 1 - Menpoc-2 1 -deoxvnucieoside-3 ' - (N , N-diisopropyl 
2-cvanoethvl phosphoramidites) 




The four deoxynucleosides were phosphitylated using either 2- 
cyanoethyl- N,N- diisopropyl chlorophosphoramidite, or 2- 
cyanoethyl- NjNfN'jN 1 - tetraisopropylphosphorodiamidite. The 

10 following is a typical procedure. Add 16. 6g (17.4 ml; 55 

mmole) of 2- cyanoethyl- N, N, N ' , N ' - tetraisopropylphosphoro- 
diamidite to a solution of 50 mmole 5 1 - MeNPOC-nucleoside and 
4.3g (25 mmole) diisopropylammonium tetrazolide in 250 mL dry 
CH 2 C1 2 under argon at ambient temperature. Continue stirring 

15 for 4-16 hours (reaction monitored by TLC: 45:45:10 

hexane/CH 2 Cl 2 /Et 3 N) . Wash the organic phase with saturated 
aqueous NaHC0 3 and brine, then dry over Na 2 S0 4 , and evaporate 
to dryness. Purify the crude amidite by flash chromatography 
(9 x 25 cm silica gel column eluted with hexane/CH 2 Cl 2 /TEA - 

20 45:45:10 for A, C, T; or 0:90:10 for G) . The yield of 
purified amidite is about 90%. 



B. PREPARATION OF LABELED DNA /HYBRIDIZATION TO ARRAY 

25 1. PCR 

PCR amplification reactions are typically conducted in a 
mixture composed of, per reaction: 1 jil genomic DNA; 10 fil 
each primer (10 pmol//xl stocks); 10 fil 10 x PCR buffer (100 mM 
Tris.Cl pH8.5, 500 mM KC1, 15 mM MgCl 2 ) ; 10 ^1 2 mM dNTPs 

30 (made from 100 mM dNTP stocks); 2.5 U Taq polymerase (Perkin 
Elmer AmpliTaq™, 5 ; and H 2 0 to 100 jzl. The cycling 

conditions are usually 40 cycles (94°C 45 sec, 55°C 30 sec, 
72 °C 60 sec) but may need to be varied considerably from 
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sample type to sample type. These conditions are for 0.2 mL 
thin wall tubes in a Perkin Elmer 9600 thermocycler . See 
Perkin Elmer 1992/93 catalogue for 9600 cycle time 
information. Target, primer length and sequence composition, 
5 among other factors, may also affect parameters. 

For products in the 200 to 1000 bp size range, check 2 pi 
of the reaction on a 1.5% 0.5x TBE agarose gel using an 
appropriate size standard (phiX174 cut with tfaelll is 
convenient) . The PCR reaction should yield several picomoles 

10 of product. It is helpful to include a negative control 

(i.e., 1 pi TE instead of genomic DNA) to check for possible 
contamination. To avoid contamination, keep PCR products from 
previous experiments away from later reactions, using filter 
tips as appropriate. Using a set of working solutions and 

15 storing master solutions separately is helpful, so long as one 
does not contaminate the master stock solutions. 

For simple amplifications of short fragments from genomic 
DNA it is, in general, unnecessary to optimize Mg 2+ 
concentrations. A good procedure is the following: make a 

2 0 master mix minus enzyme; dispense the genomic DNA samples to 

individual tubes or reaction wells; add enzyme to the master 
mix; and mix and dispense the master solution to each well, 
using a new filter tip each time. 

25 2 . PURIFICATION 

Removal of unincorporated nucleotides and primers from 
PCR samples can be accomplished using the Promega Magic PCR 
Preps DNA purification kit. One can purify the whole sample, 
following the instructions supplied with the kit (proceed from 

30 section IIIB, 'Sample preparation for direct purification from 
PCR reactions'). After elution of the PCR product in 50 pi of 
TE or H 2 0, one centrifuges the eluate for 20 sec at 12,000 rpm 
in a microfuge and carefully transfers 45 pi to a new 
microfuge tube, avoiding any visible pellet. Resin is 

3 5 sometimes carried over during the elution step. This transfer 

prevents accidental contamination of the linear amplification 
reaction with 'Magic PCR 1 resin. Other methods, e.gr., size 
exclusion chromatography, may also be used. 
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3. Linear amplification 

In a 0.2 mL thin-wall PCR tube mix: 4 pi purified PCR 
product; 2 pi primer (10 pmol/pl) ; 4 pi 10 x PCR buffer; 4 pi 
dNTPs (2 xnM dA, dC, dG, 0.1 mM dT) ; 4 pi 0.1 mM dUTP; 1 pi 1 
5 mM fluorescein dUTP (Amersham RPN 2121) ; 1 U Taq polymerase 
(Perkin Elmer, 5 U/pl) ; and add H20 to 40 pi. Conduct 40 
cycles (92°C 30 sec, 55°C 30 sec, 72°C 90 sec) of PCR. These 
conditions have been used to amplify a 3 00 nucleotide 
mitochondrial DNA fragment but are applicable to other 
10 fragments. Even in the absence of a visible product band on 
an agarose gel, there should still be enough product to give 
an easily detectable hybridization signal. If one is not 
treating the DNA with uracil DNA glycosylase (see Section 4), 
dUTP can be omitted from the reaction. 

15 

4 . Fragmentation 

Purify the linear amplification product using the Promega 
Magic PCR Preps DNA purification kit, as per Section 2 above. 
In a 0.2 mL thin-wall PCR tube mix: 4 0 pi purified labeled 
20 DNA; 4 pi 10 x PCR buffer; and 0.5 pi uracil DNA glycosylase 
(BRL lU/pl) . Incubate the mixture 15 min at 37°C, then 10 min 
at 97°C; store at -20°C until ready to use. 

5. Hybridization, Scanning & Stripping 

25 A blank scan of the slide in hybridization buffer only is 

helpful to check that the slide is ready for use. The buffer 
is removed from the flow cell and replaced with 1 mL of 
(fragmented) DNA in hybridization buffer and mixed well. The 
scan is performed in the presence of the labeled target. Fig. 

30 51 illustrates an illustrative detection system for scanning a 
DNA chip. A series of scans at 3 0 min intervals using a 
hybridization temperature of 25 °C yields a very clear signal, 
usually in at least 3 0 min to two hours, but it may be 
desirable to hybridize longer, i.e., overnight. Using a laser 

35 power of 50 pW and 50 pm pixels, one should obtain maximum 
counts in the range of hundreds to low thousands/pixel for a 
new slide. When finished, the slide can be stripped using 50% 



WO 95/11995 PCT/US94/12305 

127 

formamide. rinsing well in deionized H 2 0, blowing dry, and 
storing at room temperature. 

C. PREPARATION OF LABELED RNA/HYBRIDI ZATION TO ARRAY 
5 1. Tagged primers 

The primers used to amplify the target nucleic acid 

should have promoter sequences if one desires to produce RNA 

from the amplified nucleic acid. Suitable promoter sequences 

are shown below and include: 
10 (1) the T3 promoter sequence: 

5 1 -CGGAATTAACCCTCACTAAAGG 

5 1 -AATTAACCCTCACTAAAGGGAG ; 

(2) the T7 promoter sequence: 

5 * TAATACGACTCACTATAGGGAG ; 
15 and (3) the SP6 promoter sequence: 

5 . ATTTAGGTGACACTATAGAA. 

The desired promoter sequence is added to the 5 1 end of the 
PCR primer. It is convenient to add a different promoter to 

20 each primer of a PCR primer pair so that either strand may be 
transcribed from a single PCR product. 

Synthesize PCR primers so as to leave the DMT group on. 
DMT -on purification is unnecessary for PCR but appears to be 
important for transcription. Add 25 /xl 0.5M NaOH to 

25 collection vial prior to collection of oligonucleotide to keep 
the DMT group on. Deprotect using standard chemistry — 55 °C 
overnight is convenient. 

HPLC purification is accomplished by drying down the 
oligonucleotides, resuspending in 1 mL 0.1 M TEAA (dilute 2.0 

30 M stock in deionized water, filter through 0.2 micron filter) 
and filter through 0.2 micron filter. Load 0.5 mL on reverse 
phase HPLC (column can be a Hamilton PRP-1 semi-prep, #79426) . 
The gradient is 0 -> 50% CH 3 CN over 25 min (program 0.2 
ftmol- prep. 0-50, 25 min). Pool the desired fractions, dry down, 

35 resuspend in 200 fxl 80% HAc. 30 min RT. Add 200 fil EtOH; dry 
down. Resuspend in 200 /xl H 2 0, plus 20 pi NaAc pH5.5, 600 jxl 
EtOH. Leave 10 min on ice; centrifuge 12,000 rpm for 10 min 
in microfuge. Pour off supernatant. Rinse pellet with 1 mL 
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EtOH , dry, resuspend in 200 /xl H20. Dry, resuspend in 200 pi 
TE. Measure A260, prepare a 10 pmol/iil solution in TE (10 mM 
Tris.Cl pH 8.0, 0.1 mM EDTA) . Following HPLC purification of 
a 42 mer, a yield in the vicinity of 15 nmol from a 0.2 ^mol 
5 scale synthesis is typical. * 

2. Genomic DNA Preparation 

Add 500 jil (10 mM Tris.Cl pH8.0, 10 mM EDTA , 100 mM 
NaCl, 2% (w/v) SDS, 40 mM DTT, filter sterilized) to the 

10 sample. Add 1.25 ^1 20 mg/ml proteinase K (Boehringer) 
Incubate at 55°C for 2 hours, vortexing once or twice. 
Perform 2x 0.5 mL 1:1 phenol :CHC1 3 extractions. After each 
extraction, centrifuge 12,000 rpm 5 min in a microfuge and 
recover 0.4 mL supernatant. Add 35 til NaAc pH5.2 plus 1 mL 

15 EtOH. Place sample on ice 45 min; then centrifuge 12,000 rpm 
30 min, rinse, air dry 30 min, and resuspend in 100 fil TE. 

3. PCR 

PCR is performed in a mixture containing, per reaction: 
20 1 til genomic DNA; 4 /xl each primer (10 pmol/iil stocks) ; 4 pi 
10 x PCR buffer (100 mM Tris.Cl pH8.5, 500 mM KC1, 15 mM 
MgCl 2 ) ; 4 /il 2 mM dNTPs (made from 100 mM dNTP stocks) ; 1 U 
Taq polymerase (Perkin Elmer, 5 U/ixl) ; H 2 0 to 40 pi. About 40 
cycles (94°C 30 sec, 55°C 30 sec, 72°C 30 sec) are performed, 
25 but cycling conditions may need to be varied. These conditions 
are for 0.2 mL thin wall tubes in Perkin Elmer 9600. For 
products in the 200 to 1000 bp size range, check 2 pi of the 
reaction on a 1.5% O.SxTBE agarose gel using an appropriate 
size standard. For larger or smaller volumes (20 - 100 ^1) , 
3 0 one can use the same amount of genomic DNA but adjust the 
other ingredients accordingly. 

4. In vitro transcription ; 
Mix: 3 pi PCR product; 4 pi 5x buffer; 2 pi DTT; 2.4 pi 

35 10 mM rNTPs (100 mM solutions from Pharmacia); 0.48 pi io mM « 
f luorescein-UTP (Fluorescein-12-UTP, 10 mM solution, from 
Boehringer Mannheim); 0.5 pi RNA polymerase (Promega T3 or T7 
RNA polymerase); and add H 2 0 to 20 /xl. Incubate at 37°t for 3 
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h. Check 2 /il of the reaction on a 1.5% 0.5xTBE agarose gel 
using a size standard. 5x buffer is 200 mM Tris pH 7.5, 30 mM 
MgCl 2/ 10 mM spermidine, 50 mM NaCl, and 100 mM DTT (supplied 
with enzyme) . The PCR product needs no purification and can 
5 be added directly to the transcription mixture. A 20 /zl 
reaction is suggested for an initial test experiment and 
hybridization; a 100 /xl reaction is considered "preparative" 
scale (the reaction can be scaled up to obtain more target) . 
The amount of PCR product to add is variable; typically a PCR 

10 reaction will yield several picomoles of DNA. If the PCR 
reaction does not produce that much target, then one should 
increase the amount of DNA added to the transcription reaction 
(as well as optimize the PCR) . The ratio of f luorescein-UTP 
to UTP suggested above is 1:5, but ratios from 1:3 to 1:10 - 

15 all work well. One can also label with biotin-UTP and detect 
with streptavidin-FITC to obtain similar results as with 
f luorescein-UTP detection. 

For nondenaturing agarose gel electrophoresis of RNA, 
note that the RNA band will normally migrate somewhat faster 

2 0 than the DNA template band, although sometimes the two bands 
will comigrate. The temperature of the gel can effect the 
migration of the RNA band. The RNA produced from in vitro 
transcription is quite stable and can be stored for months (at 
least) at -20°C without any evidence of degradation. It can 

25 be stored in unsterilized 6xSSPE 0.1% triton X-100 at -20°C 
for days (at least) and reused twice (at least) for 
hybridization, without taking any special precautions in 
preparation or during use. RNase contamination should of 
course be avoided. When extracting RNA from cells, it is 

30 preferable to work very rapidly and to use strongly denaturing 
conditions. Avoid using glassware previously contaminated 
with RNases. Use of new disposable plasticware (not 
necessarily sterilized) is preferred, as new plastic tubes, 
tips, etc., are essentially RNase free. Treatment with DEPC 

35 or autoclaving is typically not necessary. 
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5. Fragmentation 

Heat transcription mixture at 94 degrees for forty min. 
The extent of fragmentation is controlled by varying Mg 2+ 
concentration (30 mM is typical) , temperature, and duration of 
heating. 

6. Hybridization, Scanning, & Stripping 

A blank scan of the slide in hybridization buffer only is 
helpful to check that the slide is ready for use. The buffer 
is removed from the flow cell and replaced with 1 mL of 
(hydrolysed) RNA in hybridization buffer and mixed well. 
Incubate for 15-30 min at 18 °C. Remove the hybridization 
solution, which can be saved for subsequent experiments. 
Rinse the flow cell 4-5 times with fresh changes of 6 x SSPE 
/ 0.1% Triton X-100, equilibrated to 18 °C. The rinses can be 
performed rapidly, but it is important to empty the flow cell 
before each new rinse and to mix the liquid in the cell 
thoroughly. A series of scans at 3 0 min intervals using a 
hybridization temperature of 25 °C yields a very clear signal, 
usually in at least 30 min to two hours, but it may be 
desirable to hybridize longer, i.e., overnight. Using a laser 
power of 50 /iW and 50 fim pixels, one should obtain maximum 
counts in the range of hundreds to low thousands/pixel for a 
new slide. When finished, the slide can be stripped using 
warm water. 

These conditions are illustrative and assume a probe 
length of "15 nucleotides. The stripping conditions suggested 
are fairly severe, but some signal may remain on the slide if 
the washing is not stringent. Nevertheless, the counts 
remaining after the wash should be very low in comparison to 
the signal in presence of target RNA. In some cases, much 
gentler stripping conditions are effective. The lower the 
hybridization temperature and the longer the duration of 
hybridization, the more difficult it is to strip the slide. 
Longer targets may be more difficult to strip than shorter 
targets. 

7. Amplification of Signal 

A variety of methods can be used to enhance detection of 
labelled targets bound to a probe on the array. in one 
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embodiment, the protein MutS (from E. coli) or equivalent 
proteins such as yeast MSH1, MSH2 , and MSH3 ; mouse Rep-3 , and 
Streptococcus Hex-A, is used in conjunction with target 
hybridization to detect probe-target complex that contain 
* 5 mismatched base pairs. The protein, labeled directly or 
indirectly, can be. added to the chip during or after 
hybridization of target nucleic acid, and differentially binds 
to homo- and heteroduplex nucleic acid, A wide variety of 
dyes and other labels can be used for similar purposes. For 
10 instance, the dye YOYO-1 is known to bind preferentially to 
nucleic acids containing sequences comprising runs of 3 or 
more G residues. 

8. Detection of Repeat Sequences 

15 In some circumstances, i.e., target nucleic acids with 

repeated sequences or with high G/C content, very long probes 
are sometimes required for optimal detect ion • In one 
embodiment for detecting specific sequences in a target 
nucleic acid with a DNA chip, repeat sequences are detected as 

20 follows. The chip comprises probes of length sufficient to 

extend into the repeat region varying distances from each end. 
The sample, prior to hybridization, is treated with a labelled 
oligonucleotide that is complementary to a repeat region but 
shorter than the full length of the repeat. The target 

25 nucleic is labelled with a second, distinct label. After 

hybridization, the chip is scanned for probes that have bound 
both the labelled target and the labelled oligonucleotide 
probe; the presence of such bound probes shows that at least 
two repeat sequences are present. 

30 

While the foregoing invention has been described in some 
detail for purposes of clarity and understanding, it will be 
clear to one skilled in the art from a reading of this 
disclosure that various changes in form and detail can be made 
3 5 without departing from the true scope of the invention. All 
publications and patent documents cited in this application 
are incorporated by reference in their entirety for all 
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purposes to the same extent as if each individual publication 
or patent document were so individually denoted. 
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Inearl A after 266 


1 GACAGAAACAAAAAAAaCAATCrmAAACAGAC 


855/684 i 


360 






2789*5G>A M> 40 - 


36 


1.10% 


Sub OA 5 one after text 1 CTCCTTGGAAAGTGA/OAJTATTDCATQTTXTA 1 


886/886 : 


374 




3272-26A>G ' 


M 7a . 


226 


rare 


Sub A>G 26 batore 17b '■ TTTATG 1 TAT 1 J LiCA(A>U> 1 U T 1 1 1 CT ATGGAAA 1 


782/901 • 


414 


3272-93T>C J 


H7a > 


228 


rare 


Sub T>C 83 betore 17b 


i A TTTGTGATATGATTArr^^TCTAATTTAGTCTTTl 


762/901 > 


414 


H1D66C 


17b 


228 


rare 


Subattuta OT at 57 


i AGGACTA1 GGACACI l|Of)OrCCCTTCGGACQGC^ 


782/901 


414 


L1077P 


17b 


228 


rare 


Subetttuta T>C at 9i 


TTACTTTGAAACTCfTX^GTTCCACAAA^ ; 


782/801 . 


414 


Y1092X 


i7b 


22B 


0.50% 


Subettnae OA al 137 


' CCAACIGbl ILI tUIAX^wgCTGTCAACACTGOG i 


782/801 . 


414 


M1101K 


17b 


22B 


moi (65%) 


SubatJtuta T>A at 163 


TGCGC rQGTT CCAAA/T>AX3AGAATAGAAATGAT I 


782/801 


414 




R1162X 


19 


240 


0.90% 


Subetttuta OT at 16 


ATGCGATCTGTGAGCfOT)GAii!t;i I IAAGTTC 


784/785 ' 


366 


3659 Oaf C 


19 


249 


0.80% 


Oceste C at 59 


AAGGTAAACCTAcCAAGTCAACCAAAOCATACA 


764/765 i 


356 - 


3849*4 A>G 


119 


249 


1.00% 


Sub AtG 4 after text base 


TCCTGQCCAGA<3GGTG(A>GK^TTTGAACACT 


784/785 


356 




3849*1 Oh b 


M9 


lOkD 


I.40X 


Sub OT EooRl Fragment 


ATAAAATGG(OT)GAG7AAQACA 


792/791 


450 




W1282R 


20 


156 


rare 


Subetttuta T>C tt 127 


AATAACTTTGCAACA<^><^G<JAfiGAAAGCCTTT 


764/768 I 


351 


W1282X 


20 


156 


2.10% 


Vrsmnee OA «i 129 


MTAACTTTGCAACAGTBrT^k)AGOAAAOCCTrT 


764/786 ! 


351 


39051n»T 


20 


156 


2.10% 


mean T et 56 


C (1 1 G II A I CAUCT T mTa^AGACT ACTGAACAC 


764/786 » 


351 


4005*1 G>A 


120 


156 


Mancneeaw 


Sub OA after Exon 20 


AGTGATACCACAGfOA^TGAGCAAAAGOACTT 


764/7B6 


351 






N1303K 


21 


90 


1.80% 


Sutenmae OG at 36 


CATTTAGAAAAAA/OGlTTGQATCCCTATXlAAC 


766/793 


396 


N1303H 


2 ■ 


9r 


rare 


Subeatuce A>C at 34 


CATTTAGAAAA lAX^CTTGGATCCCT ATQAAC [ 
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WHAT IS CLAIMED IS: 
General tiling claims 

1 1. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least two sets of 

3 oligonucleotide probes, 

4 (1) a first probe set comprising a plurality of 

5 probes, each probe comprising a segment of at least three 

6 nucleotides exactly complementary to a subsequence of the 

7 reference sequence, the segment including at least one 

8 interrogation position complementary to a corresponding 

9 nucleotide in the reference sequence, 

10 (2) a second probe set comprising a corresponding 

11 probe for each probe in the first probe set, the corresponding 

12 probe in the second probe set being identical to a sequence 

13 comprising the corresponding probe from the first probe set or 

14 a subsequence of at least three nucleotides thereof that 

15 includes the at least one interrogation position, except that 

16 the at least one interrogation position is occupied by a 

17 different nucleotide in each of the two corresponding probes 

18 from the first and second probe sets; 

19 wherein the probes in the first probe set have at least 

20 two interrogation positions respectively corresponding to each 

21 of two contiguous nucleotides in the reference sequence • 

1 2. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least four sets of 

3 oligonucleotide probes, 

4 (1) a first probe set comprising a plurality of 

5 probes, each probe comprising a segment of at least three 

6 nucleotides exactly complementary to a subsequence of the 

7 reference sequence, the segment including at least one 

8 interrogation position complementary to a corresponding 

9 nucleotide in the reference sequence, 

10 (2) second, third and fourth probe sets, each 

11 comprising a corresponding probe for each probe in the first 

12 probe set, the probes in the second, third and fourth probe 

13 sets being identical to a sequence comprising the 

14 corresponding probe from the first probe set or a subsequence 
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15 of at least three nucleotides thereof that includes the at 

16 least one interrogation position, except that the at least one 

17 interrogation position is occupied by a different nucleotide 

18 in each of the four corresponding probes from the four probe 

19 sets. 

1 3. The oligonucleotide array of claim 2, further 

2 comprising a fifth probe set comprising a corresponding probe 

3 for each probe in the first probe set, the corresponding probe 

4 from the fifth probe set being identical to a sequence 

5 comprising the corresponding probe from the first probe set or 

6 a subsequence of at least three nucleotides thereof that 

7 includes the at least one interrogation position, except that 

8 the at least one interrogation position is deleted in the 

9 corresponding probe from the fifth probe set. 

1 4. The oligonucleotide array of claim 2, further 

2 comprising a sixth probe set comprising a corresponding probe 

3 for each probe in the first probe set, the corresponding probe 

4 from the sixth probe set being identical to a sequence 

5 comprising the corresponding probe from the first probe set or 

6 a subsequence of at least three nucleotides thereof that 

7 includes the at least one interrogation position, except that 

8 an additional nucleotide is inserted adjacent to the at least 

9 one interrogation position in the corresponding probe from the 
10 first probe set. 

1 5. The array of claim 2, wherein the first probe set has 

2 at least three interrogation positions respectively 

3 corresponding to each of three contiguous nucleotides in a 

4 reference sequence. 

1 6. The array of claim 2, wherein the first probe set has 

2 at least 50 interrogation positions respectively corresponding 

3 to each of 50 contiguous nucleotides in a reference sequence. 

1 7. The array of claim 1 or 2 , wherein the first probe 

2 set has at least 100 interrogation positions respectively 
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3 corresponding to each of 100 contiguous nucleotides in a 

4 reference sequence. 

1 8. The oligonucleotide array of claim 1 or 2 , wherein 

2 the first probe set has an interrogation position 

3 corresponding to each of at least 3 0% of the nucleotides in a 

4 reference sequence and the reference sequence comprises at 

5 least 100 nucleotides. 



1 9. The oligonucleotide array of claim 8, wherein the 

2 first probe set comprises probes which completely span the 

3 reference sequence, which probes relative to the reference 

4 sequence, overlap one another in sequence. 

1 10. The oligonucleotide array of claim 9, wherein the 

2 first probe set has an interrogation position corresponding to 

3 each of the nucleotides in the reference sequence. 

1 11. The oligonucleotide array of claim 10, wherein the 

2 probes are oligodeoxyribonucleotides . 



1 12. The oligonucleotide array of claim 1 or 2, wherein 

2 the array comprises between 100 and 10,000 probes. 

1 13. The oligonucleotide array of claim 1 or 2 , wherein 

2 the array comprises between 10,000 and 100,000 probes. 

1 14. The oligonucleotide array of claim 1 or 2, wherein 

2 the array comprises between 100,000 and 10,000,000 probes. 

1 15. The oligonucleotide array of claim 1 or 2 , wherein 

2 the probes are linked to the support via a spacer. 

1 16. The oligonucleotide array of claim 1 or 2, wherein 

2 the segment in each probe of the first probe set that is 

3 exactly complementary to the subsequence of the reference 

4 sequence is 9-21 nucleotides. 
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1 17. The oligonucleotide array of claim 16, wherein the 

2 segment is n nucleotides long, and the subsequence is at least 

3 n-2 nucleotides long. 

1 18. The oligonucleotide array of claim 1 or 2 , wherein 

2 each probe of the first probe set consists of the segment that 

3 is exactly complementary to the subsequence of the reference 

4 sequence. 

1 19. The oligonucleotide array of claim 1 or 2 , wherein 

2 the probes in the second, third and fourth probe sets are 

3 identical to the corresponding probe from the first probe set 

4 except that the at least one interrogation position is 

5 occupied by a different nucleotide in each of the four 

6 corresponding probes from the four probe sets. 

1 20. The array of claim 2, further comprising fifth, 

2 sixth and seventh probe sets, wherein: 

3 the segment of each probe in the first set 

4 includes at least two interrogation positions each 

5 corresponding to a nucleotide in the reference sequence, 

6 the second, third and fourth probe sets, each 

7 comprise a corresponding probe for each probe in the first 

8 probe set, the corresponding probes in the second, third and 

9 fourth probe sets being identical to a sequence comprising the 

10 corresponding probe from the first probe set or a subsequence 

11 of at least three nucleotides thereof that includes a first 

12 interrogation position except that the first interrogation 

13 position is occupied by a different nucleotide in each of the 

14 four corresponding probes from the four probe sets; 

15 the fifth, sixth and seventh probe sets, each 

16 comprising- a corresponding probe for each probe in the first 

17 probe set, the probes in the fifth, sixth and seventh probe 

18 sets being identical to a sequence comprising the 

19 corresponding probe from the first probe set or a subsequence 

20 of at least three nucleotides thereof that includes a second 

21 interrogation position, except that the second interrogation 
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22 position is occupied by a different nucleotide in each of the 

2 3 four corresponding probes from the four probe sets. 

1 21. The array of claim 2, wherein each probe in the 

2 first probe set further comprises a second segment of at least 

3 three nucleotides exactly complementary to a second 

4 subsequence of the reference sequence, and the probes from the 

5 second, third and fourth probe sets comprise the corresponding 

6 probe from the first probe set or a subsequence thereof 

7 comprising the first and second segments except in the at 

8 least one interrogation position. 

1 22. The array of claim 2, further comprising: 

2 a fifth probe set comprising at least one probe 

3 comprising a segment of at least seven nucleotides exactly 

4 complementary to a subsequence of the reference sequence 

5 except at one or two positions, the segment including at least 

6 one interrogation position corresponding to a nucleotide in 

7 the reference sequence not at the one or two positions; 

8 sixth, seventh and eighth probe sets, each comprising a 

9 probe for each probe in the fifth probe set, the corresponding 

10 probes from the sixth, seventh & eighth probe sets being 

11 identical to a sequence comprising the corresponding probe 

12 from the fifth probe set or a subsequence of at least nine 

13 nucleotides thereof including the at least one interrogation 

14 position and the one or two positions, except in the at least 

15 one interrogation position, which is occupied by a different 

16 nucleotide in each of the four probes. 

1 23. The array of claim 2, wherein the probes are 

2 arranged on the substrate so that the first set of probes is 

3 arranged in a row across the substrate in an order reflecting 

4 the overlap between the probes and the reference sequence, and 

5 the additional sets of probes are arranged in columns relative 

6 to the probes in said first set, so that probes with the same 

7 interrogation position are in the same column and so that each 

8 column comprises at least 4 probes. 
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1 24. The array of Claim 2, wherein said probes are 12 to 

2 17 nucleotides in length. 

1 25. The array of Claim 2, wherein said probes are 15 

2 nucleotides in length and attached by a covalent linkage to a 

3 site on a 3 '-end of said probes, and said interrogation 

4 position is located at position 7 f relative to the 3*-end of 

5 said probes. 

1 26. The array of claim 2, further comprises fifth, 

2 sixth, seventh and eighth probe sets, 

3 (1) a fifth probe set comprising a plurality of 

4 probes, each probe comprising a segment of at least three 

5 nucleotides exactly complementary to a subsequence of a second 

6 reference sequence, the segment including at least one 

7 interrogation position complementary to a corresponding 

8 nucleotide in the reference sequence, 

9 (2) the sixth, seventh, and eighth probe sets, each 

10 comprising a corresponding probe for each probe in the fifth 

11 probe set, the probes in the sixth, seventh and eighth probe 

12 sets being identical to a sequence comprising the 

13 corresponding probe from the fifth probe set or a subsequence 

14 of at least three nucleotides thereof that includes the at 

15 least one interrogation position, except that the at least one 

16 interrogation position is occupied by a different nucleotide 

17 in each of the four corresponding probes from the fifth, 

18 sixth, seventh and eighth probe sets. 

1 27. The array of claim 22, wherein the first, second, 

2 third and fourth probe sets have probes of a first length and 

3 the fifth, sixth, seventh and eight probe sets have probes of 

4 a second length different from the first length. 

Tiling for wildtype and mutant reference sequences 

1 28. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least one pair of first 

3 and second probe groups, each group comprising a first and 

4 second sets of oligonucleotide probes as defined by claim 1; 
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5 wherein each probe in the first probe set from the 

6 first group is exactly complementary to a subsequence of a 

7 first reference sequence and each probe in the first probe set 

8 from the second group is exactly complementary to a 

9 subsequence from a second reference sequence. 

1 29. The array of claim 28, wherein the second reference 

2 sequence is a mutated form of the first reference sequence. 

1 30. The array of claim 28, wherein each group further 

2 comprises third and fourth probe sets, each comprising a 

3 corresponding probe for each probe in the first probe set, the 

4 probes in the second, third and fourth probe sets being 

5 identical to a sequence comprising the corresponding probe 

6 from the first probe set or a subsequence of at least three 

7 nucleotides thereof that includes the interrogation position, 

8 except that the interrogation position is occupied by a 

9 different nucleotide in each of the four corresponding probes 
10 from the four probe sets. 



1 31. The array of claim 3 0 that comprises at least five 

2 pairs of first and second probe groups, wherein the probes in 

3 the first probe sets from the first groups of the five pairs 

4 are exactly complementary to subsequences from five different 

5 respective first reference sequences. 

1 32. The array of claim 30 that comprises at least forty 

2 pairs of first and second probe groups, wherein the probes in 

3 the first probe sets from the first groups of the forty pairs 

4 are exactly complementary to subsequences from forty 

5 respective first reference sequences. 

Block tiling 

1 33. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least a group of probes 

3 comprising: 

4 a wildtype probe comprising a segment of at least three 

5 nucleotides exactly complementary to a subsequence of a 
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6 reference sequence, the segment having at least first and 

7 second interrogation positions corresponding to first and 

8 second nucleotides in the reference sequence, 

9 a first set of three mutant probes, each identical to a 

10 sequence comprising the wildtype probe or a subsequence of at 

11 least three nucleotides thereof including the first and second 

12 interrogation positions, except in the first interrogation 

13 position, which is occupied by a different nucleotide in each 

14 of the three mutant probes and the wildtype probe; 

15 a second set of three mutant probes, each identical to a 

16 sequence comprising the wildtype probe or a subsequence of at 

17 least three nucleotides thereof including the first and second 

18 interrogation positions, except in the second interrogation 

19 position, which is occupied by a different nucleotide in each 
2 0 of the three mutant probes and the wildtype probe. 

1 34. The array of claim 33, wherein the segment of the 

2 wildtype probe comprises 3-20 interrogation positions 

3 corresponding to 3-20 respective nucleotides in the reference 

4 sequence, and the array comprises 3-2 0 respective sets of 

5 three mutant probes, each of the three probes identical to a 

6 sequence comprising the wildtype probe or a subsequence 

7 thereof including the 3-20 interrogation positions, except 

8 that one of the 3-20 interrogation positions is occupied by a 

9 different nucleotide in each of the three mutant probes and 

10 the wildtype probes, the one of the 3-20 interrogation 

11 positions being different in each of the 3-20 respective sets 

12 of three mutant probes. 

1 • 35. An array of probes immobilized to a solid support 

2 comprising two groups of probes, each group as defined by 

3 claim 33, a first group comprising a wildtype probe comprising 

4 a segment exactly complementary to a subsequence of a first 

5 reference sequence and a second group comprising a wildtype 

6 probe comprising a segment exactly complementary to a 

7 subsequence of a second reference sequence. 
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1 36. The array of claim 35 , comprising at least 10-100 

2 groups of probes, each comprising a wildtype probe comprising 

3 a segment exactly complementary to a subsequence of at least 

4 10-100 respective reference sequences. 

Pooled probes 

1 37. A method of comparing a target sequence with a 

2 reference sequence, the method comprising: 

3 identifying variants of a reference sequence differing 

4 from the reference sequence in at least one nucleotide; 

5 assigning each variant a designation, 

6 providing an array of pools of probes, each pool 

7 occupying a separate cell of the array, wherein each pool 

8 comprises a probe comprising a segment exactly complementary 

9 to each variant sequence assigned a particular designation, 

10 contacting the array with a target sequence comprising a 

11 variant of the reference sequence; 

12 determining the relative hybridization intensities of the 

13 pools in the array to the target sequence; 

14 determining the target sequence from the relative 

15 hybridization intensities of the pools. 

1 38. The method of claim 37, wherein the variants are 

2 assigned numbers according to an error code. 

1 39. The method of claim 37, wherein each variant is 

2 assigned a designation having at least one digit and at least 

3 one value for the digit, and each pool comprise a probe 

4 comprising a segment exactly complementary to each variant 

5 sequence assigned a particular value in a particular digit. 

1 40. The method of claim 39, wherein the variants are 

2 assigned successive numbers in a numbering system of base m 

3 having n digits, and the array comprises n x (m-1) pools of 

4 probes. 
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41. The method of claim 40, wherein each pool further 
comprises a probe comprising a segment exactly complementary 
to the reference sequence. 

Trellis tiling 

42. A pooled probe comprising a segment exactly 
complementary to a subsequence of a reference sequence except 
at a first interrogation position occupied by a pooled 
nucleotide N, a second interrogation position occupied by a 
pooled nucleotide selected from the group of three consisting 
of (1) M or K, (2) R or Y and (3) S or W, and a third 
interrogation position occupied by a second pooled nucleotide 
selected from the group, wherein the pooled nucleotide 
occupying the second interrogation position comprises a 
nucleotide complementary to a corresponding nucleotide from 
the reference sequence when the second pooled probe and 
reference sequence are maximally aligned, and the pooled 
nucleotide occupying the third interrogation position 
comprises a nucleotide complementary to a corresponding 
nucleotide from the reference sequence when the third pooled 
probe and the reference sequence are maximally aligned, 
wherein N is A, C, G or T(U) , K is G or T(U) , M is A or C, R 
is A or G, Y is C or T(U) , W is A or T(U) and S is G or C. 

43. An array of oligonucleotide probes immobilized on 
solid support, the array comprising: 

first, second and third cells respectively occupied by 
first, second and third pooled probes, each pooled probe 
comprising a segment exactly complementary to a subsequence of 
a reference sequence except at a first interrogation position 
occupied by a pooled nucleotide N, a second interrogation 
position occupied by a pooled nucleotide selected from the 
group of three consisting of ' (1) M or K, (2) R or Y and (3) s 
or W, and a third interrogation position occupied by a second 
pooled nucleotide selected from* the group, wherein the pooled 
nucleotide occupying the second interrogation position 
comprises a nucleotide complementary to a corresponding 
nucleotide from the reference sequence when the pooled probe 
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15 and the reference sequence are maximally aligned, and the 

16 pooled nucleotide occupying the third interrogation position 

17 comprises a nucleotide complementary to a corresponding 

18 nucleotide from the reference sequence when the pooled probe 

19 and the reference sequence are maximally aligned; 

20 provided that one of the three interrogation 

21 positions in the each of the three pooled probes is aligned 

22 with the same corresponding nucleotide in the reference 

23 sequence, this interrogation position being occupied by an N 

24 in one of the pooled probes, and a different pooled nucleotide 

25 in each of the other two pooled probes, 

26 wherein N is A, C, G or T(U) , K is G or T(U), M is A 

27 or C, R is A or G, Y is C or T(U), W is A or T(U) and S is G 

28 or C. 



1 44. The array of claim 43 further comprising: 

2 fourth and fifth cells respectively occupied by fourth 

3 and fifth pooled probes, each pooled probe as defined by 

4 claim 43, 

5 wherein one of the three interrogation position in the 

6 second, third and fourth pooled probes is aligned with the 

7 same corresponding nucleotide in the reference sequence, this 

8 interrogation position being occupied by an N in one of the 

9 pooled probes, and a different pooled nucleotide in each of 

10 the other two pooled probes, 

11 wherein one of the three interrogation position in the 

12 third, fourth and fifth pooled probes is aligned with the same 

13 corresponding nucleotide in the reference sequence, this 

14 interrogation position being occupied by an N in one of the 

15 pooled probes, and a different pooled nucleotide in each of 

16 the other two pooled probes . 

1 45. The array of claim 44, wherein the pooled probes are 

2 identical except at the interrogation positions. 

1 46. The array of claim 44, wherein the first, second, 

2 third, fourth and fifth pooled probes are exactly 

3 complementary to five respective subsequences of the reference 
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sequences that from each other by increments of one 
nucleotide. 

Bridge tiling 

47. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least four probes: 

a first probe comprising first and second segments, each 
of at least three nucleotides and exactly complementary to 
first. and second subsequences of a reference sequences, the 
segments including at least one interrogation position 
corresponding to a nucleotide in the reference sequence, 
wherein either (1) the first and second subsequences are 
noncontiguous, or (2) the first and second subsequences are 
contiguous and the first and second segments are inverted 
relative to the complement of the first and second 
subsequences in the reference sequence; 

second, third and fourth probes, identical to a sequence 
comprising the first probe or a subsequence thereof comprising 
at least three nucleotides from each of the first and second 
segments, except in the at least one interrogation position, 
which differs in each of the probes. 

48. The array of claim 47, wherein the first and second 
subsequences are separated by one or two nucleotides in the 
reference sequence. 

Two interrogation positions (no wildtype) 

49. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least a set of four 
probes, each of the probes comprising a segment of at least 7 
nucleotides that is exactly complementary to a subsequence 
from a reference sequence, except that the segment may or may 
not be exactly complementary at two interrogation positions, 
wherein: 

the first interrogation position is occupied by a 
different nucleotide in each of the four probes, 

the second interrogation position is occupied by a 
different nucleotide in each of the four probes, 
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12 in first and second probes, the segment is exactly 

13 complementary to the subsequence, except at not more than one 

14 of the interrogation positions, and 

15 in third and fourth probes, the segment is exactly 

16 complementary to the subsequence, except at both of the 

17 interrogation positions. 

1 50. An array of probes immobilized to a support, the 

2 array comprising at least 100 sets of 4 probes, each set as 

3 defined by claim 49, the probes from the at least 100 sets 

4 comprising at least 100 respective segments, the segments 

5 having at least 100 respective first and second interrogation 

6 positions . 

Helper mutations 

1 51. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising a set of probes 

3 comprising: 

4 a first probe comprising a segment of at least 7 

5 nucleotides exactly complementary to a subsequence of a 

6 reference sequence except at one or two positions, the segment 

7 including an interrogation position not at the one or two 

8 positions; 

9 second, third and fourth mutant probes, each identical to 

10 a sequence comprising the wildtype probe or a subsequence 

11 thereof including the interrogation position and the one or 

12 two positions, except in the interrogation position, which is 

13 occupied by a different nucleotide in each of the four probes. 

Omission of Perfectly Matched Probe 

1 52. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least two sets of 

3 oligonucleotide probes, 

4 (l) a first probe set comprising a plurality of 

5 probes, each probe comprising a segment exactly complementary 

6 to a subsequence of at least 3 nucleotides of a reference 

7 sequence except at an interrogation position , 
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8 (2) a second probe set comprising a corresponding 

9 probe for each probe in the first probe set, the corresponding 

10 probe in the second probe set being identical to a sequence 

11 comprising the corresponding probe from the first probe set or 

12 a subsequence of at least three nucleotides thereof that 

13 includes the interrogation position, except that the 

14 interrogation position is occupied by a different nucleotide 

15 in each of the two corresponding probes and the complement to 

16 the reference sequence, 

17 wherein the probes in the first probe set have at 

18 least three interrogation positions respectively corresponding 

19 to each of three contiguous nucleotides in the reference 

20 sequence. 

Methods 

1 53. A method of comparing a target nucleic acid with a 

2 reference sequence comprising a predetermined sequence of 

3 nucleotides, the method comprising: 

4 (a) hybridizing the target nucleic acid to an array 

5 of oligonucleotide probes immobilized on a solid support, the 

6 array comprising: 

7 (1) a first probe set comprising a plurality of 

8 probes, each probe comprising a segment of at least three 

9 nucleotides exactly complementary to a subsequence of the 

10 reference sequence, the segment including at least one 

11 interrogation position complementary to a corresponding 

12 nucleotide in the reference sequence, 

13 (2) a second probe set comprising a corresponding 

14 probe for each probe in the first probe set, the corresponding 

15 probe in the second probe set being identical to a sequence 

16 comprising the corresponding probe from the first probe set or 

17 a subsequence of at least three nucleotides thereof that 

18 includes the at least one interrogation position, except that 

19 the at least one interrogation position is occupied by a 

20 different nucleotide in each of the two corresponding probes 

21 from the first and second probe sets; 

22 wherein, the probes in the first probe set have at 

23 least three interrogation positions respectively corresponding 
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24 



26 



25 



to each of at least three nucleotides in the reference 
sequence, and 

(b) determining which probes, relative to one 



27 another, in the array bind specifically to the target nucleic 

28 acid, the relative specific binding of the probes indicating 

29 whether the target sequence is the same or different from the 

30 reference sequence. 



2 comprises third and fourth probe sets, each comprising a 

3 corresponding probe for each probe in the first probe set, the 

4 probes in the second, third and fourth probe sets being 

5 identical to a sequence comprising the corresponding probe 

6 from the first probe set or a subsequence of at least three 

7 nucleotides thereof that includes the at least one 

8 interrogation position, except that the at least one 

9 interrogation position is occupied by a different nucleotide 

10 in each of the four corresponding probes from the four probe 

11 sets. ' « 1 

1 55. The method of claim 54, wherein the target sequence 

2 has a substituted nucleotide relative to the reference 

3 sequence in at least one undetermined position, and the 

4 relative specific binding of the probes indicates the location 

5 of the position and the nucleotide occupying the position in 

6 the target sequence. 

1 56. The method of claim 54, wherein: 

2 the hybridizing step comprises hybridizing the 

3 target nucleic acid and a second target nucleic acid to the 

4 array; and 

5 .the determining step comprises determining which 

6 probes, relative to one another, in the array bind 

7 specifically to the target nucleic acid or the second target 

8 nucleic acid, the relative specific binding of the probes 

9 indicating whether the target sequence is the same or 

10 different from the reference sequence and whether the second 



1 



54. The method of claim 53, wherein the array further 
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11 target sequence is the same or different from the reference 

12 sequence. 

1 57. The method of claim 56, wherein the target sequence 

2 has a label and the second target sequence has a second label 

3 different from the label. 

1 58. The method of claim 56, wherein undetermined first 

2 and second proportions of the first and second target 

3 sequences are hybridized to the array and the specific binding 

4 indicates the proportions. 

1 59. The method of claim 54, further comprising: 

2 (c) removing the target nucleic acid from the array; 

3 (d) hybridizing a second target nucleic acid to the 

4 array ; 

5 (e) determining which probes, relative to one another, in 

6 the array bind specifically to the second target nucleic acid, 

7 the relative specific binding of the probes indicating whether 

8 the second target sequence is the same or different from the 

9 reference sequence. 

1 60. A method of comparing a target nucleic acid with a 

2 reference sequence comprising a predetermined sequence of 

3 nucleotides, the method comprising: 

4 hybridizing the target sequence to the array of 

5 claim 28; 

6 determining which probes in the first group, 

7 relative to one another, hybridize to the target sequence, the 

8 relative specific binding of the probes indicating whether the 

9 target sequence is the same or different from the first 

10 reference sequence; 

11 determining which probes in the second group, 

12 relative to one another, hybridize to the target sequence, the 

13 relative specific binding of the probes indicating whether the 

14 target sequence is the same or different from the second 

15 reference sequence. 
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1 61. The method of claim 60, wherein the hybridizing step 

2 comprising hybridizing the target sequence and a second target 

3 sequence to the array, and the relative specific binding of 

4 the probes from the first group indicates that the target is 

5 identical to the first reference sequence, and the relative 

6 specific binding of the probes from the second group indicates 

7 that the second target sequence is identical to the second 

8 reference sequence. 

1 62. The method of claim 61, wherein the first and second 

2 target sequences are heterozygous alleles of a gene. 

Comparative hybridization 

1 63. A method of comparing a target nucleic acid with a 

2 reference sequence comprising a predetermined sequence of 

3 nucleotides, the method comprising: 

4 (a) hybridizing the reference sequence to an array 

5 of oligonucleotide probes immobilized on a solid support, the 

6 array comprising; 

7 (1) a first probe set comprising a plurality of 

8 probes, each probe comprising a segment of at least 3 

9 nucleotides exactly complementary to a subsequence of the 

10 reference sequence except in at least one interrogation 

11 position; 

12 (2) a second probe set comprising a corresponding 

13 probe for each probe in the first probe set, the corresponding 

14 probe in the second probe set being identical to a sequence 

15 comprising the corresponding probe from the first probe set or 

16 a subsequence of at least three nucleotides thereof that 

17 includes the at least one interrogation position, except that 

18 the at least one interrogation position is occupied by a 

19 different nucleotide in each of the two corresponding probes 

20 from the first and second probe sets; and 



21 (b) determining which probes, relative to one 

22 another, in the array bind specifically to the reference 

23 sequence; 

24 (c) hybridizing a target sequence to the array; 
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25 (d) determining which probes, relative to one 

26 another, in the array bind specifically to the target 

27 sequence; 

28 wherein the relative specific binding of the probes 

29 to the reference and the target sequence indicates whether the 

30 reference sequence is the same or different from the target 

31 sequence. 

1 64. The method of claim 63, wherein the reference 

2 sequence has a first label and the second reference sequence 

3 has a second label different from the first label, and steps 

4 (a) and (c) are performed simultaneously. 

HIV Chip 

1 65. The array of claim 2, wherein the reference sequence 

2 is from a human immunodeficiency virus. 

1 66. The array of claim 65, wherein the reference 

2 sequence is from a reverse transcriptase gene of the human 

3 immunodeficiency virus. 

1 67. The array of claim 66, wherein the reference 

2 sequence is from a protease gene of the human immunodeficiency 

3 virus. 

1 68. The array of claim 66, wherein the reference 

2 sequence is . a full-length reverse transcriptase gene. 

1 69. The array of claim 68 comprising at least 3200 

2 oligonucleotide probes. 

1 70. The array of claim 66, wherein the HIV gene is from 

2 the BRU HIV strain. 

1 71. The array of claim 66, wherein the HIV gene is from 

2 the SF2 HIV strain. 
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1 72. The array of claim 28, wherein the reference 

2 sequence is from the coding strand of a reverse transcriptase 

3 gene of a human immunodeficiency virus and the second 

4 reference sequence is from the noncoding strand of the reverse 

5 transcriptase gene. 

1 73. The array of claim 28, wherein the first reference 

2 sequence is from a reverse transcriptase gene of a human 

3 immunodeficiency virus and the second reference sequence 

4 comprises a subsequence of the first reference sequence with a 

5 substitution of at least one nucleotide. 

1 74. The array of claim 73, wherein the substitution 

2 confers drug resistance to a human immunodeficiency virus 

3 comprising the second reference sequence. 

1 75. The array of claim 28, wherein the first and second 

2 reference sequences are from a reverse transcriptase gene from 

3 first and second strains of a human immunodeficiency virus. 

1 76. The array of claim 28, wherein the first reference 

2 sequence is from a reverse transcriptase gene of a human 

3 immunodeficiency virus and the second reference sequence is 

4 from a 16S RNA, or DNA encoding the 16S RNA, from a pathogenic 

5 microorganism. 

1 77. The array of claim 28, wherein the first reference 

2 sequence is from a reverse transcriptase gene of a human 

3 immunodeficiency virus and the second reference sequence is 

4 from a protease gene of the human immunodeficiency virus. 

1 78. The method of claim 54, wherein the reference 

2 sequence is from a human immunodeficiency virus. 

1 79. The method of claim 78, wherein the reference 

2 sequence is from a human immunodeficiency virus and the target 

3 sequence is from a second human immunodeficiency virus. 
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1 80. The method of claim 79, wherein the target sequence 

2 has a substituted nucleotide relative to the reference 

3 sequence in at least one undetermined position, and the 

4 relative specific binding of the probes indicates the location 

5 of the position and the nucleotide occupying the position in 

6 the target sequence. 

1 81. The method of claim 80, wherein the target sequence 

2 has a substituted nucleotide relative to the reference 

3 sequence in at least one position, the substitution conferring 

4 drug resistance to the human immunodeficiency virus, and the 

5 relative specific binding of the probes reveals the 

6 substitution. 

1 82. The method of claim 78, wherein: 

2 the hybridizing step comprises hybridizing the 

3 target nucleic acid and a second target nucleic acid, the 

4 second target sequence being from a reverse transcriptase gene 

5 of a third human immunodeficiency virus, to the array; and 

6 the determining step comprises determining which 

7 probes, relative to one another, in the array bind 

8 specifically to the target nucleic acid or the second target 

9 nucleic acid, the relative specific binding of the probes 

10 indicating whether the target sequence is the same or 

11 different from the reference sequence and whether the second 

12 target sequence is the same or different from the reference 

13 sequence. 

1 83. The method of claim 82, wherein the first target 

2 sequence has a first label and the second target sequence has 

3 a second label different from the first label. 

1 84. The method of claim 82, wherein undetermined first 

2 and second proportions of the first and second target 

3 sequences are hybridized to the array and the specific binding 

4 indicates the proportions. 
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CFTR Chip 

1 85, The array of claim 2, wherein the reference sequence 

2 is from a CFTR gene. 

1 86. The array of claim 85, wherein the reference 

2 sequence is exon 10 of a CFTR gene, and said array comprises 

3 over 1000 oligonucleotide probes, 10 to 18 nucleotides in 

4 length. 

1 87. The array of claim 85, wherein said array comprises 

2 a set of probes comprising a specific nucleotide sequence 

3 selected from the group of sequences comprising: 

4 3 1 -TTTATAXTAG ; 

5 3«- TTATAGXAGA; 

6 3'- TATAGTXGAA; 

7 3«- ATAGTAXAAA; 

8 3'- TAGTAGXAAC; 

9 3»- AGTAGAXACC; 

10 3'- GTAGAAXCCA; 

11 3»- TAGAAAXCAC; and 

12 3 1 - AGAAACXACA; wherein each set comprises 4 probes, 

13 and X is individually A, G, C, and T for each set. 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 



88. The array of claim 85, wherein said group of 
sequences comprises: 



-TTTATAXTAGAAACC ; 

- TTATAGXAGAAACCA; 

- TATAGTXGAAACCAC; 

ATAGTAXAAACCACA ; 
TAGTAGXAACCACAA ; 
AGTAGAXACCACAAA; 
GTAGAAXCCACAAAG ; 
TAGAAAXCACAAAGG; and 
AGAAACXACAAAGGA ; wherein each set comprises 4 



probes, and X is individually A, G, C, and T for each set. 



1 89. The array of claim 32, wherein the forty first 

2 reference sequences are from a CFTR gene. 
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1 90. The array of claim 89, wherein each of the forty 

2 first reference sequences includes a site of a mutation and at 

3 least one adjacent nucleotide, 

1 91. The array of claim 90, wherein each of the forty 

2 first reference sequences comprises at least five contiguous 

3 nucleotides from a CFTR gene. 

1 92. The array of claim 89, wherein at least one first 

2 reference sequence is a from the coding strand of the cystic 

3 fibrosis gene and at least one first reference sequence is 

4 from the noncoding strand of the CFTR gene. 

1 93. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least a group of probes 

3 comprising: 

4 a wildtype probe exactly complementary to a subsequence 

5 of a reference sequence from a cystic fibrosis gene, the 

6 segment having at least five interrogation positions 

7 corresponding to five contiguous nucleotides in the reference 

8 sequence, 

9 a first set of three mutant probes, each identical to the 

10 wildtype probe, except in a first of the five interrogation 

11 positions, which is occupied by a different nucleotide in each 

12 of the three mutant probes and the wildtype probe; 

13 a second set of three mutant probes, each identical to 

14 the wildtype probe, except in a second of the five 

15 interrogation positions, which is occupied by a different 

16 nucleotide in each of the three mutant probes and the wildtype 

17 probe; 

18 a third set of three mutant probes, each identical to the 

19 wildtype probe, except in a third of the five interrogation 

20 positions, which is occupied by a different nucleotide in each 

21 of the three mutant probes and the wildtype probe; 

22 a fourth set of three mutant probes, each identical to 

23 the wildtype probe, except in a fourth of the five 

24 interrogation positions, which is occupied by a different 
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25 nucleotide in each of the three mutant probes and the wildtype 

26 probe; 

27 a fifth set of three mutant probes, each identical to the 

28 wildtype probe, except in a fifth of the five interrogation 

29 positions, which is occupied by a different nucleotide in each 

30 of the three mutant probes and the wildtype probe. 

1 94. The array of claim 93 comprising first and second 

2 groups of probes, each group as defined by claim 93, the first 

3 group comprising a wildtype probe exactly complementary to a 

4 first reference sequence, and the second group comprising a 

5 wildtype probe exactly complementary to a second reference 

6 sequence, wherein the second reference sequence is a mutated 

7 form of the first reference sequence. 

1 95. The array of claim 94, wherein the first reference 

2 sequence is from a CFTR gene and the second reference sequence 

3 is a mutated form of the first reference sequence. 



1 96. The method of claim 56, wherein the target sequence 

2 and the second target sequence are from heterozygous alleles 

3 of a CFTR gene. 

P53 Chip 

1 97. The array of claim 2, wherein the reference sequence 

2 is a sequence from a p53 gene. 

1 98. The array of claim 2, wherein the reference sequence 

2 is from an hMLHl gene. 

1 99. The array of claim 2, wherein the reference sequence 

2 is from an MSH2 gene. 

1 100. The array of claim 28, wherein the reference 

2 sequence is from a human P53 gene and the second reference 

3 sequence is from an hMLHl gene. 



1 



101. The array of claim 100, further comprising: 
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2 ninth, tenth, eleventh and twelfth probe sets, 

3 (1) ' the ninth probe set comprising a plurality of 

4 probes, each probe comprising a segment of at least three 

5 nucleotides exactly complementary to a subsequence of a third 

6 reference sequence, the segment including at least one 

7 interrogation position complementary to a corresponding 

8 nucleotide in the third reference sequence, 

9 (2) the tenth, eleventh and twelfth probe sets, 

10 each comprising a corresponding probe for each probe in the 

11 ninth probe set, the probes in the tenth, eleventh and twelfth 

12 probe sets being identical to a sequence comprising the 

13 corresponding probe from the ninth probe set or a subsequence 

14 of at least three nucleotides thereof that includes the at 

15 least one interrogation position, except that the at least one 

16 interrogation position is occupied by a different nucleotide 

17 in each of the four corresponding probes from the ninth, 

18 tenth, eleventh and twelfth probe sets. 

1 102. The array of claim 97, wherein the first probe set 

2 has at least 60 interrogation positions corresponding to at 60 

3 contiguous nucleotides from exon 6. 

1 103. The array of claim 98, wherein the reference 

2 sequence is exon 5 of a p53 gene, the probes are 17 

3 nucleotides long, and the first set of probes is exactly 

4 complementary to the reference sequence, and the at least one 

5 interrogation position is at position 7, relative to a 3 '-end 

6 of each probe, which 3'-end is covalently attached to the 

7 substrate. 

Mitochondrial Chip 

1 104.- The array of claim 2, wherein the reference 

2 sequence is from a mitochondrial genome. 

1 105. The array of claim 104, wherein said reference 

2 sequence is a sequence of a D-loop region. 
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1 106. The array of claim 

2 full-length, 

1 107. The array of claim 

2 sequence is at least 9 0% of a 

3 genome. 
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105, wherein D-loop region is 

104, wherein said reference 
full-length mitochondrial 



1 

2 
3 



108. The array of claim 104, wherein the reference 
sequence is bounded by positions 16280 to 356 of the 
mitochondrial genome. 
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fig. s : Tiled Array with Probes for the Detection 
of Point Mutations 



3 ' -CCGACTACAGTCGTT 
2 • -CCGACTCCAGTCGTT 
3 '-CCGACTGCAGTCGTT 
2 ' - CCGACTTCAGTCGTT. 
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(57) Abstract 

Methods are provided for detecting and quantitating gene sequences, such as mutated genes and oncogenes, in biological 
fluids. The fluid sample (e.g., plasma, serum, urine, etc.) is obtained, deproteinized and the DNA present in the sample is extract- 
ed. Following denaturation of the DNA, an amplification procedure, such as PCR or LCR, is conducted to amplify the mutated 
gene sequence. 
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DETECTION OF GENE SEQUENCES IN BIOLOGICAL FLUIDS 



Government Support 

The research leading to this invention was 
supported by government funding pursuant to NIH Grant 
No. CA 47248, 

Background of the Invention 

Soluble DNA is known to exist in the blood 
of healthy individuals at concentrations of about 
5 to 10 ng/ml. " It is believed that soluble DNA is 
present in increased levels in the blood of 
individuals having autoimmune diseases, particularly 
systemic lupus erythematosus (SLE) and other diseases 
including viral hepatitis, cancer and pulmonary 
embolism. It is not known whether circulating 
soluble DNA represents a specific type of DNA which 
is particularly prone to appear in the blood, 
However, studies indicate that the DNA behaves as 
double-stranded DNA or as a mixture of 
double-stranded and single-stranded DNA, and that it 
is likely to be composed of native DNA with 
single-stranded regions. Dennin, R.H., Klin . 
Wochenschr . 57:451-456, (1979). Steinman, C.R., J. 
Clin. Invest . , 73:832-841, (1984). Fournie, G.J. et 
al., Analytical Biochem. 158:250-256, (1986). There 
is also evidence that in patients with SLE, the 
circulating DNA is enriched for human repetitive 
sequence (Alu) containing fragments when compared to 
normal human genomic DNA. 



In patients with cancer, the levels of 
circulating soluble DNA in blood are significantly, 
increased. Types of cancers which appear to have a 
high incidence of elevated DNA levels include 
pancreatic carcinoma, breast carcinoma, colorectal 
carcinoma and pulmonary carcinoma. In these forms of 
cancer, the levels of circulating soluble DNA in 
blood are usually over 50 ng/ml, and generally the 
mean values are more than 150 ng/ml. Leon et al., 
Can , Res . 37:646-650, 1977; Shapiro et al., Cance r 
51:2116-2120, 1983. 

Mutated oncogenes have been described in 
experimental and human tumors. In some instances 
certain mutated oncogenes are associated with 
particular types of tumors. Examples of these are 
adenocarcinomas of the pancreas, colon and lung which 
have approximately a 75%, 50%, and 35% incidence 
respectively, of Kirsten ras (K-ras) genes, with 
mutations in positions 1 or 2 of codons 12. The most 
frequent mutations are changes from glycine to valine 
(GGT to GTT) , glycine to cysteine (GGT to TGT) , and 
glycine to aspartic acid (GGT to GAT). Other, but 
less common mutations of codon 12 include mutations 
to AGT and CGT. K-ras genes in somatic cells of such 
patients are not mutated. 

The ability to detect sequences of mutated 
oncogenes or other genes in small samples of 
biological fluid, such as blood plasma, would provide 
a useful diagnostic tool. The presence of mutated 
K-ras gene sequences in the plasma would be 
indicative of the presence in the patient of a tumor 
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which contains mutated oncogenes. Presumably this 
would be a specific tumor marker since there is no 
other known source of mutated K-ras genes. 
Therefore, this evaluation may be useful in 
suggesting and/or confirming a diagnosis. The amount 
of mutated K-ras sequences in the plasma may relate 
to the size of the tumor, the growth rate of the 
tumor and/or the regression of the tumor. Therefore, 
serial quantitation of mutated K-ras sequences may be 
useful in determining changes in tumor mass. Since 
most human cancers have mutated oncogenes, evaluation 
of plasma DNA for mutated sequences may have very 
wide applicability and usefulness. 

Summary Of The Invention 

This invention recognizes that gene 
sequences (e.g., oncogene sequences) exist in blood, 
and provides a method for detecting and quantitating 
gene sequences such as from mutated oncogenes and 
other genes in biological fluids, such as blood 
plasma and serum. The method can be used as a 
diagnostic technique to detect certain cancers and 
other diseases which tend to increase levels of 
circulating . soluble DNA in blood. Moreover, this 
method is useful in assessing the progress of 
treatment regimes for patients with certain cancers. 

The method of the invention involves the 
initial steps of obtaining a sample of biological 
fluid (e.g., urine, blood plasma or serum, sputum, 
cerebral spinal fluid), then deproteinizing and 
extracting the DNA. The DNA is then amplified by 
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techniques such as the polymerase chain reaction 
(PCR) or the ligase chain reaction (LCR) in an allele 
specific manner to distinguish a normal gene sequence 
from a mutated gene sequence present in the sample. 
In one embodiment where the location of the mutation 
is known, the allele specific PCR amplification is 
performed using four pairs of oligonucleotide 
primers. The four primer pairs include a set of four 
allele specific first primers complementary to the 
gene. sequence contiguous with the site of the 
mutation on the first strand. These four primers are 
unique with respect to each other and differ only at 
the 3' nucleotide which is complementary to the wild 
type nucleotide or to one of the three possible 
mutations which can occur at this known position. 
The four primer pairs also include a single common 
primer which is used in combination with each of the 
four unique first strand primers. The common primer 
is complementary to a segment of a second strand of 
the DNA, at some distance from the position of the 
first primer. 

This amplification procedure amplifies a 
known base pair fragment which includes the 
mutation. Accordingly, this technique has the 
advantage of displaying a high level of sensitivity 
since it is able to detect only a few mutated DNA 
sequences in a background of a 10 7 -fold excess of 
normal DNA. The method is believed to be of much 
greater sensitivity than methods which detect point 
mutations by hybridization of a PCR product with 
allele specific radiolabeled probes which will not 
detect a mutation if the normal DNA is in more than 
20-fold excess. 
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The above embodiment is useful where a 
mutation exists at a known location on the DNA. In 
another embodiment where the mutation is known to 
exist in one of two possible positions, eight pair of 
oligonucleotide primers may be used. The first set 
of four primer pairs (i.e., the four unique, allele 
specific primers, each of which forms a pair with a 
common primer) is as described above. The second set 
of four primer pairs comprises four allele specific 
primers complementary to the gene sequence contiguous 
with the site of the second possible mutation on the 
sense strand. These four primers are unique with 
respect to each other and differ at the terminal 3' , 
nucleotide which is complementary to the wild type 
nucleotide or to one of the three possible mutations 
which can occur at this second known position. Each 
of these allele specific primers is paired with 
another common primer complementary to the other 
strand, distant from the location of the mutation. 

The PCR techniques described above 
preferably utilize a DNA polymerase which lacks 
3 'exonuclease activity and therefore the ability to 
proofread. A preferred DNA polymerase is Thermus 
aguaticus DNA polymerase. 

During the amplification procedure, it is 
usually sufficient to conduct approximately 30 cycles 
of amplification in a DNA thermal cycler. After an 
initial denaturation period of 5 minutes, each 
amplification cycle preferably includes a 
denaturation period of about 1 minute at 95°C. , 
primer annealing for about 2 minutes at 58°C and an 
extension at 72°C for approximately 1 minute. 
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Following the amplification, aliquots of 
amplified DNA from the PGR can. be analyzed by 
techniques such as electrophoresis through agarose 
gel using ethidium bromide . staining . Improved 
sensitivity may be attained by using labelled primers 
and subsequently identifying the amplified product by 
detecting radioactivity or chemiluminescense on 
film. Labelled primers may also permit quantitation 
of the amplified product which may be used to 
determine the amount of target sequence in the 
original specimen. 

As used herein, allele specific 
amplification describes a feature of the method of 
the invention where primers are used which are 
specific to a mutant allele, thus enabling 
amplification of the sequence to occur where there is 
100% complementarity between the 3' end of the primer 
and the target gene sequence. Thus, allele specific 
amplification is advantageous in that it does not 
permit amplification unless there is a mutated 
allele. This provides an extremely sensitive 
detection technique. 

Brief Description Of The Drawings 

Figures 1A and IB are diagramatic 
representations of the amplification strategy for the 
detection of a mutated K-ras gene with a mutation 
present at a single known location of K-ras, 
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Figures 2A and 2B are diagramatic 
representations of the amplification strategy for 
detection of a mutated K-ras gene with a mutation 
present at a second of two possible locations of 
K-ras . 

Detailed Description of The Invention 

The detection of mutated DNA, such as 
specific single copy genes, is potentially useful for 
diagnostic purposes/ and/or for evaluating the extent 
of a disease. Normal plasma is believed to contain 
about 10 ng of soluble DNA per ml. The concentration 
of soluble DNA in blood plasma is known to increase 
markedly in individuals with cancer and some other 
diseases. The ability to detect the presence of 
known mutated gene sequences, such as K-ras gene 
sequences, which are indicative of a medical 
condition, is thus highly desirable. 

The present invention provides a highly 
sensitive diagnostic method enabling the detection of 
such mutant alleles in biological fluid, even against 
a background of as much as a 10 7 -fold excess of 
normal DNA . The method generally involves the steps 
of obtaining a sample of a biological fluid 
containing soluble DNA, deproteinizing , extracting 
and denaturing the DNA, followed by amplifying the 
DNA in an allele specific manner, using a set of 
primers among which is a primer specific for the 
mutated allele. Through this allele specific 
amplification technique, only the mutant allele is 
amplified. Following amplification, various 
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techniques may be employed to detect the presence of 
amplified DNA and . to quantify the amplified DNA. The 
presence of the amplified DNA represents the presence 
of the mutated gene, and the amount of the amplified 
gene present can provide an indication of the extent 
of a disease. 

This technique is applicable to the 
identification in biological fluid of sequences from 
single copy genes, mutated at a known position on the 
gene. Samples of biological fluid having soluble DNA 
(e.g., blood plasma, serum, urine , sputum, cerebral 
spinal fluid) are collected and treated to 
deproteinize and extract the DNA. Thereafter, the 
DNA is denatured. . The DNA is then amplified in an 
allele specific manner so as to amplify the gene 
bearing a mutation. 

During deproteinization of DNA from the 
fluid sample, the rapid removal of pfotein and the 
virtual simultaneous deactivation of any DNase is 
believed to be important. DNA is deproteinized by 
adding to aliquots of the sample an equal volume of 
20% NaCl and then boiling the mixture for about 3 to 
4 minutes. Subsequently, standard techniques can be 
used to complete the extraction and isolation of the 
DNA. A preferred extraction process involves 
concentrating the amount of DNA in the fluid sample 
by techniques such as centrif ugation . 

The use of the 20% NaCl solution, followed 
by boiling, is believed to rapidly remove protein and 
simultaneously inactivate any DNases present. DNA 
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present in the plasma is believed to be in the form 
of nucleosomes and is thus believed to be protected 
from the DNases while in blood. However, once the 
DNA is extracted, it is susceptible to the DNases. 
Thus, it is important to inactivate the DNases at the 
same time as deproteinization to prevent the DNases 
from inhibiting the amplification process by reducing 
the amount of DNA available for amplification. 
Although the 20% NaCl solution is currently- 
preferred, it is understood that other concentrations 
of NaCl, and other salts, may also be used. 

Other techniques may also be used to extract 
the DNA while preventing the DNases from affecting 
the available DNA. Because plasma DNA is believed to 
be in the form of nucleosomes (mainly histones and 
DNA), plasma DNA could also be isolated using an 
antibody to histones or other nucleosomal proteins. 
Another approach could be to pass the plasma (or 
serum) over a solid support with attached a'ntihistone 
antibodies which would bind with the nucleosomes. 
After rinsing the nucleosomes can be eluted from the 
antibodies as an enriched or purified fraction. 
Subsequently, DNA can be extracted using the above or 
other conventional methods. 

In one embodiment, the allele specific 
amplification is performed through the Polymerase 
Chain Reaction (PGR) using primers having 3' terminal 
nucleotides complementary to specific point mutations 
of a gene for which detection is sought. PCR 
preferably is conducted by the method described by 
Saiki, "Amplification of Genomic DNA", PCR Protocols , 
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Eds. M.A. Innis, et al., Academic Press, San Diego 
(1990), pp. 13. In addition, the PCR is conducted 
using a thermostable DNA polymerase which lacks 3' 
exonuclease activity and therefore the ability to 
repair single base mismatches at the 3' terminal 
nucleotide of the DNA primer during amplification. 
As noted, a preferred DNA polymerase is 2. agvaticys 
DNA polymerase. A suitable L. aquaticus DNA 
polymerase is commercially available from 
Perkin-Elmer as AmpliTaq DNA polymerase. Other 
useful DNA polymerases which lack 3' exonuclease 
activity include a VentR (exo-) , available from New 
England Biolabs, Inc., (purified from strains of IL_ 
coli that carry a DNA polymerase gene from the 
argha&haeterium Thermococcus litoralis) . Hot Tub DNA 
polymerase derived from Thermus f lauus and available 
from Amersham Corporation, and Tth DNA polymerase 
derived form Thermus thermonhilus , available form 
Epicentre Technologies, Molecular Biology Resource 
Inc., or Perkin-Elmer Corp. 

This method conducts the amplification using 
four pairs of oligoucleotide primers. A first set of 
four primers comprises four allele specific primers 
which are unique with respect to each other. The 
four allele specific primers are each paired with a 
common distant primer which anneals to the other DNA 
strand distant from the allele specific primer. One 
of the allele specific primers is complementary to 
the wild type allele (i.e., is allele specific to the 
normal allele) while the others have a mismatch at 
the 3' terminal nucleotide of the primer. As noted, 
the four unigue primers are individually paired for 
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amplif ication (e.g., by PGR amplification) with a 
common distant primer. When the mutated allele is 
present, the primer pair including the allele 
specific primer will amplify efficiently and yield a 
detectable product. While the mismatched primers may 
anneal, the strand will not be extended during 
amplification. 

The above primer combination is useful where 
a mutation is known to exist at a single position on 
an allele of interest. Where the mutation may exist 
at one of two locations, eight pair of 
oligonucleotide primers may be used. The first set 
of four pair is as described above. The second four 
pair or primers comprises four allele specific 
oligonucleotide primers complementary to the gene 
sequence contiguous with the site of the second 
possible mutation on the sense strand. These four 
primers differ at the terminal 3' nucleotide which is 
complementary to the wild type nucleotide or to one 
of the three possible mutations which can occur , at 
this second known position. Each of the four allele 
specific primers is paired with a single common 
distant primer which is complementary to the 
antisense strand upstream of the mutation. 

During a PGR amplification using the above 
primers, only the primer which is fully complementary 
to the allele which is present will anneal. and 
extend. The primers having a non-complementary 
nucleotide may partially anneal, but will not extend 
during the amplification process. Amplification 
generally is allowed to proceed for a suitable number 
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of cycles, i.e., from about 20 to 40, and most 
preferably for about 30. This technique amplifies a 
mutation-containing fragment of the target gene with 
sufficient sensitivity to enable detection of the 
mutated target gene against a significant background 
of normal DNA, 

The K-ras gene has point mutations which 
usually occur at one or two known positions in a 
known codon. Other oncogenes may have mutations at 
known but variable locations. Mutations with the 
K-ras gene are typically known to be associated with 
certain cancers such as adenocarcinomas of the lung, 
pancreas, and colon. Figures 1A through 2B 
illustrate a strategy for detecting, through PCR 
amplification, a mutation occurring at position 1 or 
2 of the 12th codon of the K-ras oncogene. As 
previously noted, mutations at the first or second 
position of the 12th codon of K-ras are often 
associated with certain cancers such as 
adenocarcinomas of the lung, pancreas, and colon. 

Referring to Figures 1A and IB, the DNA from 
the patient sample, is separated into two strands (A 
and B), which represent the sense and antisense 
strands. The DNA represents an oncogene having a 
point mutation which occurs on the same codon (i.e., 
codon 12.) at position 1 (X^) . The allele-specif ic 
primers used to detect the mutation at position 1, 
include a set of four PI sense primers (Pl-A) , each 
of which is unique with respect to the others. The 
four Pl-A primers are complementary to a gene 
sequence contiguous with the site of the mutation on 
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strand A. The four Pl-A primers preferably differ 
from each other only at the terminal 3'nucleotide 
which is complementary to the wild type nucleotide or 
to one of the three possible mutations which can 
occur at this known position. Only the Pl-A primer 
which is fully complementary to the 

mutation-containing segment on the allele will anneal 
and extend during amplification. 

A common downstream primer (Pl-B), 
complementary to a segment of the B strand downstream 
with respect to the position of the Pl-A primers, is 
used in combination with each of the Pl-A primers. 
The Pl-B primer illustrated in Figure 1 anneals to 
the allele and is extended during the PCR. Together, 
the Pl-A and Pl-B primers identified in Table 1 and 
illustrated in Figure IB amplify a fragment of the 
oncogene having 161 base pairs. 

Figures 2A and 2B illustrate a scheme 
utilizing an additional set of four unique, allele 
specific primers (P2-A) to detect a mutation which 
can occur at codon 12 of the oncogene, at position 2 
(X2) . The amplification strategy illustrated in 
Figures 1A and IB would be used in combination with 
that illustrated in Figures 2A and 2B to detect 
mutations at either position 1 (Xi) or position 
2 (X 2 ) in Codon 12. 

Referring to Figures 2A and 2B, a set of 
four unique allele specific primers (P2-A) are used 
to detect a mutation present at a position 2 (X2) of 
codon 12. The four P2-A primers are complementary to 
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the genetic sequence contiguous with the site of the 
second possible mutation, These four primers are 
unique with respect to each other and preferably 
differ only at the terminal 3' nucleotide which is 
complementary to the wild type nucleotide or to one 
of the three possible mutations which can occur at 
the second known position (X2) . 

A single common upstream primer (P2-B) 
complementary to a segment of the A strand upstream 
of the mutation, is used in combination with each of 
the unique P2-A primers. The P2-A and P2-B primers 
identified in Table 1 and illustrated in Figure 2B 
will amplify a fragment having 146 base pairs. 

During the amplification procedure, the 
polymerase chain reaction is allowed to proceed for 
about 20 to 40 cycles and most preferably for 30 
cycles. Following an initial denaturation period of 
about 5 minutes, each cycle, using the AmpliTaq DNA 
polymerase, typically includes about one minute of 
denaturation at 95° C, two minutes of primer 
annealing at about 58° C, and a one minute extension 
at 72°C, While the temperatures and cycle times 
noted above are currently preferred, it is noted that 
various modifications may be made. Indeed, the use 
of different DNA polymerases and/or different primers 
may necessitate changes in the amplification 
conditions. One skilled in the art will readily be 
able to optimize the amplification conditions. 
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Exemplary DNA primers which are useful in 
practicing the method of this invention to detect the 
K-ras gene, having point mutations at- either the 
first or second position in codon 12 of the gene, are 
illustrated in Table 1. 

TABLE 1 

Primers Used to Amplify (by PCR) Position 1 
and 2 Mutations at Codon 12 of K-ras Gene 



(5'-3*) 

Sequence* . Strand PI pr P2 

GTGGTAGTTGGAGCTG A PI 

GTGGTAGTTGGAGCTC A PI 

GTGGTAGTTGGAGCTT A PI 

GTGGTAGTTGGAGCT& A PI 

CAGAGAAACCTTTATCTG B PI 

ACTCTTGCCTACGCCAC A \ P2 

ACTCTTGCCTACGCCAS A P2 

ACTCTTGCCTACGCCAT A P2 

ACTCTTGCCTACGCCAC A P2 

GTACTGGTGGAGTATTT B P2 



*Underlined bases denote mutations.' 

The primers illustrated in Table 1 are, of 
course, merely exemplary. Various modifications can 
be made to these primers as is understood by those 
having ordinary skill in the art. For example, the 
primers could be lengthened or shortened, however the 
3' terminal nucleotides must remain the same. In 
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addition, some mismatches 3 to 6 nucleotides back 
from the 3* end may be made and would not be likely 
to interfere with efficacy. The common primers can 
also be constructed differently so as to be 
complementary to a different site, yielding either a 
longer or shorter amplified product. 

In one embodiment, the length of eadh allele 
specific primer can be different, making it possible 
to combine multiple allele specific primers with 
their common distant primer in the same PCR 
reaction. The .length of the amplified product would 
be indicative of which allele specific primer was 
being utilized with the amplification. The length of 
the amplified product would indicate which mutation 
was present in the specimen. 

The primers illustrated in Table 1 and 
Figures IB and 2B, and others which could be used, 
can be readily synthesized by one having ordinary 
skill in the art. For example, the preparation of 
similar primers has been described by Stork et al . , 
Oncogene , 6:857-862, 1991. 

Other amplification methods and strategies 
may also be utilized to detect gene sequences in 
biological fluids according to the method of the 
invention. For example, another approach would be to 
combine PCR and the ligase chain reaction (LCR) . 
Since PCR amplifies faster than LCR and requires 
fewer copies of target DNA to initiate, one could use 
PCR as first step and then proceed to LCR. Primers 
such as the common primers used in the allele 
specific amplification described previously which 
span a sequence of approximately 285 base pairs in 
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length, more or less centered on codon 12 of K-ras, 
could be used to amplify this fragment, using 
standard P.CR conditions- The amplified product 
(approximately a 285 base pair sequence) could then 
be used in a LCR or ligase detection reaction (LDR) 
in an allele specific manner which would indicate if 
a mutation was present. Another, perhaps less 
sensitive, approach would be to use LCR or LDR for 
both amplification and allele specific 
discrimination. The later reaction is advantageous 
in that it results in linear amplification. Thus the 
amount of amplified product is a reflection of the 
amount of target DNA in the original specimen and 
therefore permits quantitation. 

LGR utilizes pairs of adjacent 
oligonucleotides which are complementary to the 
entire length of the target sequence (Barany F. , PNAS 
88: 189-193, 1991; Barany F., PGR Methods and 
Applications 1: 5-16, 1991). If the target sequence 
is perfectly complementary to the primers at the 
junction of these sequences, a DNA ligase will link 
the adjacent 3' and 5' terminal nucleotides forming a 
combined sequence. If a thermostable DNA ligase is 
used with thermal cycling, the combined sequence will 
be sequentially amplified. A single base mismatch at 
the junction of the olignoucleotides will preclude 
ligation and amplification. Thus, the process is 
allele specific. Another set of oligonucleotides 
with 3' nucleotides specific for the mutant would be 
used in another reaction to identify the mutant 
allele. A series of standard conditions could be 
used to detect all possible mutations at any known 



site, LCR typically utilizes both strands. of genomic 
DNA as targets for oligonucleotide hybridization with 
four primers, and the product is increased 
exponentially by repeated thermal cycling. 

A variation of the reaction is the ligase 
detection reaction (LDR) which utilizes two adjacent 
oligonucleotides which are complementary to the 
target DNA and are similarly joined by DNA ligase 
(Barany F, , PNAS 88:189-193, 1991). After multiple 
thermal cycles the product is amplified in a linear 
fashion. Thus the amount of the product of LDR 
reflects the amount of target DNA . Appropriate 
labeling of the primers allows detection of the 
amplified product in an allele specific manner, as 
well as quantitation of the amount of original target 
DNA. One advantage of this type of reaction is that 
it allows quantitation through automation (Nickerson 
et al.r PNAS 87: 8923-8927, 1990). 

Examples of suitable oligonucleotides for 
use with LCR for allele specific ligation and 
amplification to identify mutations at position 1 in 
codon 12 of the K-ras gene are illustrated below in 
Table 2. 
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TABLE 2 

Oligonucleotides (5'-3') for use in LCR 



Sequence* Strand PI or P2 

AGCTCCAACTACCACAAGTT Al A 

GCACTCTTGCCTACGCCACC A2-A A 

GCACTCTTGCCTACGCCACA A2-B A 

GCACTCTTGCCTACGCCACA A2-C A 

GCACTCTTGCCTACGCCACT A2-D A 

GGTGGCGTAGGCAAGAGTGC Bl B 

AACTTGTGGTAGTTGGAGCT B2-A B 

AACTTGTGGTAGTTGGAGCA B2-B B 

AACTTGTGGTAGTTGGAGCT B2-C B 

AACTTGTGGTAGTTGGAGCS B2-D B 



*Underlined bases denote mutations. 



During an amplification procedure involving 
LCR four oligonucleotides are used at a time. For 
example, oligonucleotide Al and, separately, each of 
the A2 oligonucleotides are paired on the sense 
strand. Also, oligonucleotide Bl and, separately, 
each of the B2 oligonucleotides are paired on the 
antisense strand. For an LCD procedure, two 
oligonucleotides are paired, i.e., Al with each of 
the,A2 oligonucleotides, for linear amplification of 
the normal and mutated target DNA sequence. 
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The method of the invention is applicable to 
the detection and quantitation of other oncogenes in 
DNA present in various biological fluids. The p53 
gene is a gene for which convenient detection and 
quantitation could be useful because alterations in 
this gene are the most common genetic anomaly in 
human cancer, occurring in cancers of many histologic 
types arising from many anatomic sites. Mutations of 
the p53 may occur at multiple codons within the gene 
but 80% are localized within 4 conserved regions, or 
"hot spots", in exons 5, 6, 7 and 8. The most 
popular current method for identifying the mutations 
in p53 is a multistep procedure. It involves PCR 
amplification of exons 5-8 from genomic DNA, 
individually, in combination (i.e., multiplexing), or 
sometimes as units of more than one exon. An 
alternative approach is to isolate total cellular 
RNA, which is transcribed with reverse 
transcriptase. A portion of the reaction mixture is 
subjected directly to PCR to amplify the regions of 
p53 cDNA using a pair of appropriate oligonucleotides 
as primers. These two types of amplification are 
followed by single strand conformation polymorphism 
analysis (SSCP) which will identify amplified samples 
with point mutations from normal DNA by differences 
in mobility when electrophoresed in polyacrylamide 
gel. If a fragment is shown by SSCP to contain a 
mutation, the latter is amplified by asymmetric PCR 
and the sequence determined by the dideoxy-chain 
termination method (Murakami et al, Can, Re?., 51: 
3356-33612, 1991). 



-21- 



Further, the ligase chain reaction (LCR) may 
be useful with p53 since LCR is better able to 
evaluate multiple mutations at the same time. After 
determining the- mutation, allele specific primers can 
be prepared for subsequent quantitation of the 
mutated gene in the patient's plasma at multiple 
times during the clinical course. 

Preferably, the method of the invention is 
conducted using biological fluid samples of 
approximately 5ml. However, the method can also be 
practiced using smaller sample sizes in the event 
that specimen supply is limited. In such case, it 
may be advantageous to first amplify the DNA present 
in the sample using the common primers. Thereafter, 
amplification can proceed using the allele specific 
primers. 

The method of this invention may be embodied 
in diagnostic kits. Such kits may include reagents 
for the isolation of DNA as well as sets of primers 
used in the detection method, and reagents useful in 
the amplification. Among the reagents useful for the 
kit is a DNA polymerase used to effect the 
amplification. A preferred polymerase is Thermus 
aguaticus DNA polymerase available from Perkin-Elmer 
as AmpliTaq DNA polymerase. For quantitation of the 
mutated gene sequences, the kit can also contain 
samples of mutated DNA for positive controls as well 
as tubes for quantitation by competitive PCR having 
the engineered sequence in known amounts. 
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The quantitation of the mutated K-ras . 
sequences may be achieved using either slot blot 
Southern hybridization or competitive PCR. Slot blot 
Southern hybridization can be a performed utilizing 
the allele specific primers as probes under 
relatively stringent conditions as described by 
Verlaan-de Vries et al., Gene 50:313-20, 1986. The 
total DNA extracted from 5 ml of plasma will be slot 
blotted with 10 fold serial dilutions, followed by 
hybridization to an end-labeled allele specific probe 
selected to be complementary to the known mutation in 
the particular patient's tumor DNA as determined 
previously by screening with the battery of allele 
specific primers and PCR and LCR. Positive 
autoradiographic signals will be graded 
semiquantitatively by densitometery after comparison 
with a standard series of diluted DNA (1-500 ng) from 
tumor cell cultures which have the identical mutation 
in codon 12 of the K-ras, prepared as slot blots in 
the same way, 

A modified competitive PCR (Gilliland et 
♦ al., Proc. Nat, Acad, Sci,, USA 87:2725:79; 1990 ; 
Gilliland et al., "Competitive PCR for Quantitation 
of MRNA " , PCR Protocols (Acad. Press), pp. 60-69, 
1990) could serve as a potentially more sensitive 
alternative to the slot blot Southern hybridization 
quantitation method. In this method of quantitation, 
the same pair or primers are utilized to amplify two 
DNA templates which compete with each other during 
the amplification process. One template is the 
sequence of interest in unknown amount, i.e. mutated 
K-ras, and the other is an engineered deletion mutant 
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in known amount which, when amplified, yields a 
shorter product which can be distinguished from the 
amplified mutated K-ras sequence. Total DNA 
extracted from the plasma as described above will be 
quantitated utilizing slot blot Southern 
hybridization, utilizing a radiolabeled human 
repetitive sequence probe (BLUR 8) . This will allow a 
quantitation of total extracted plasma DNA so that 
the same amount can be used in each of the PCR 
reactions. DNA from each patient (100 ng) will be 
added to a PCR master mixture containing PI or P2 
allele specific primers corresponding to the 
particular mutation previously identified for each 
patient in a total volume of 400 ul. Forty ul of 
master mixture containing 10 ng of plasma DNA will be 
added to each of 10 tubes containing 10 yl of 
competitive template ranging from 0.1 to 10 
attomoles. Each reaction mixture will contain dNTPs 
(25uM final concentration including [a- 32 P]dCTP at 
50yCi/ml), 50 pmoles of each primer, 2mM MgCl2, 2 
units of T\ aauaticus DNA polymerase, 1 x PCR buffer, 
50 ug/ml BSA, and water to a final volume of 40 ul. 
Thirty cycles of PCR will be followed by 
electrophoresis of the amplified products. Bands 
identified by ethidium bromide will excised, counted 
and a ratio of K-ras sequence to deletion mutant 
sequence calculated. To correct for difference in 
molecular weight, cprn obtained for genomic K-ras 
bands will multiplied by 141/161 or 126/146, 
depending upon whether position 1 (PI) or position 2 
(P2) primers are used. (The exact ratio will depend 
upon the length of the deletion mutant.) Data will 
be plotted as log ratio of deletion template 
DNA/K-ras DNA vs. log input deletion template DNA 
(Gilliland et al. 1990a, 1990b). 
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A modified competitive PCR could also be 
developed in which one primer has a modified 5' end 
which carries a biotin moiety and the other primer 
has a 5' end with a fluorescent chromophore. The 
amplified product can then be separated from the 
reaction mixture by adsorption to avidin or 
streptavidin attached to a solid support. The amount 
of product formed in the PCR can be guantitated by 
measuring the amount of fluorescent primer 
incorporated into double-stranded DNA by denaturing 
the immobilized DNA by alkali and thus eluting the 
fluorescent single stands from the solid support and 
measuring the fluorescence (Landgraf et al., Anal . 
Biochem . 182:231-235, 1991). 

The competitive template preferably 
comprises engineered deletion mutants with a sequence 
comparable to the fragments of the wild type K-ras 
and the mutated K-ras gene amplified by the PI and P2 
series of primers des-cribed previously, except there 
will be an internal deletion of approximately 20 
nucleotides. Therefore, the amplified products will 
smaller, i.e., about 140 base pairs and 125 base 
pairs when the PI primers and P2 primers are used, 
respectively. Thus, the same primers can be used and 
yet amplified products from the engineered mutants 
can be readily distinguished from the amplified 
genomic sequences. 

Eight deletion mutants will be produced 
using the polymerase chain reaction (Higuchi et al., 
Nucleic Acids Res . 16:7351-67 1988); Vallette et al,, 
Nucleic Acids Res . 17:723-33, 1989; Higuchi, 
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PCR Technology , Ch, 6, pp. 61-70 (Stockton Press, 
1989)). The starting material will be normal genomic 
DNA representing the wild-type K-ras or tumor DNA 
from tumors which are known to have each of the 
possible point mutations in position one and two of 
codon 12. The wild-type codon 12 is GGT. The 
following tumor DNA can be used: 

First position codon 12 mutations 

G-A A54 9 

G-T* Calul, PR371 

G-C A2182, A1698 

Second position codon 12 mutations 
G-A* Aspcl 
G-T* SW480 

G-C 818-1, 181-4, 818-7 

(*G-T t ransversions in the first or sepond position 
account for approximately 80% of the point mutations 
found in pulmonary carcinoma and GAT (aspartic acid) 
or GTT (valine) are most common in pancreatic 
cancer . 

The deletion mutants with an approximately 
20 residue deletion will be derived as previously 
described (Vallette et al. 1989). In summary, the PI 
and P2 primers will be used in an allele specific 
manner with the normal DNA or with DNA from the tumor 
cell line with each specific mutation. Each of these 
would be paired for amplification with a common 
primer which contains the sequence of the common 
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primer normally used with either the PI and P2 allele 
specific primers, i.e., "Pl-B" or "P2-B" at the 5' 
end with an attached series of residues representing 
sequences starting approximately 20 bases downstream, 
thus spanning the deleted area (common deletion 
primer 1 and 2, CD1 and CD2). The precise location 
and therefore sequence of the 3 ' portion of the 
primer will be determined after analysis of the 
sequence of the ras gene in this region with OLIGO 
(NB1 , Plymouth, MN) , a computer program which 
facilitates the selection of optimal primers. The 
exact length of the resultant amplified product is 
not critical, so the best possible primer which will 
produce a deletion of 20-25 residues will be 
selected. For example, with P2 primers the allele' 
specific primer for the wild-type will be 
5' ACTCTTGCCTACGCCAC 3' complementary to residues 35 
to 51 in the coding sequence. To effect a deletion 
of approximately 20 residues in the complementary 
strand, the common upstream primer to be used with 
the wild-type and the three allele specific primers 
for mutations in position two of codon 12 will be 40 
residues long (CD2) complementary to residues -95 to 
-78 (the currently preferred common upstream primer 
for use with P2 allele specific primers and residues 
at approximately -58 to -25). The amplified shorter 
product will be size-separated by gel electrophoresis 
and purified by Prep-a-Gene (Biorad) . DNA 
concentrations will be determined by the ethidium 
bromide staining with comparison to dilutions of DNA 
of known concentration. This approach will be 
repeated eight times, using the four PI primers and 
common primer (CD1) constructed as above, and four 
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times with the four P2 primers and common primer 
(CD2). These deletion mutants will be amplified, 
using the same allele specific primers used to 
amplify the genomic DNA. Therefore, they can be used 
subsequently in known serial dilutions in a 
competitive PCR, as outlined above. 

The invention is further illustrated by the 
following non-limiting examples. 

Example 1 

Blood was collected in 13 x 75 mm vacutainer 
tubes containing 0.05 ml of 15% K3EDTA. The tubes 
were immediately centrifuged at 4°C for 30 minutes at 
1000 g, the plasma was removed and recentrif uged at 
4°C for another 30 minutes at 1000 g. 
The plasma was stored at -70°C. Next, DNA was 
deproteinized by adding an equal volume of 20% NaCl 
to 5 ml aliquots of plasma which were then bbiled for 
3 to 4 minutes. After cooling, the samples were 
centrifuged at 3000 rpm for 30 minutes. The 
supernatant was removed and dialysed against three 
changes of 10 mM Tris-HCl (pH 7.5)/l mM EDTA (pH 8.0) 
("TE") for 18 to 24 hours at 4°C. The DNA was 
extracted once with two volumes of phenol, 2x1 volume 
phenol : chloroform: isoamyl alcohol (25:24:1) and 2x1 
volume chloroform: isoamyl alcohol (24:1). DNA was 
subsequently precipitated with NaCl at 0.3M, 20pg/ml 
glycogen as a carrier and 2.5 volumes of 100% ethanol 
at minus 20°C for 24 hours. DNA was recovered by 
centrifugation in an Eppendorf Centrifuge at 4°C for 
30 minutes. The DNA was then resuspended in a TE 
buffer. The DNA extracted and prepared in the above 
manner was then able to be amplified. 
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Example 2 

An allele specific amplification of DNA 
obtained and prepared according to example 1 was 
conducted by PGR as follows to detect the K-ras gene 
in the DNA having a mutation at position 1 or 2 of 
the codon 12 of the K-ras gene. In each of eight 
reaction tubes was added DNA extracted from 0.5 ml of 
plasma in total volume of 40nl containing 67 mM 
Tris-HCl (pH 8.8), 10 mM B-mercaptoethanol , 16.6 
ammonium sulfate, 6.7 yM EDTA, 2.0mM, MgCl2, 50jig/ml 
BSA, 25uH dNTP . Also, 50 pmoles of each of the 
primers identified in Table 1 was included, together 
with 3 units of Thermus aauaticus DNA polymerase 
(available from Perkin-Elmer as AmpliTaq) . PCR was 
conducted with an initial denaturation at 95°C for 5 
minutes, followed by 30 cycles of PCR amplification 
in a DNA thermal cycler (Cetus; Perkin-Elmer Corp . 
Norwalk, Connecticut) . Each amplification cycle 
includes a 1 minute denaturation at 95°C, a 2 minute 
primer annealing period at 58°C, and a 1 minute 
extension period at 72°C. 

Following the completion of amplif ication r 
10-15ul of each of the PCR reaction products is 
analyzed by electrophoresis in a 2% agarose gei/lX 
TAE-0.5vig/ml EtBr. The electrophoresis uses an 
applied voltage of 100 volts for 90 minutes. 
Photographs of the samples are then taken using 
ultraviolet light under standard conditions. 

* It is understood that various modifications 
can be made to the present invention without 
departing from the scope of the claimed invention. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Sorenson, George D. 

(ii) TITLE OF INVENTION: Detection of 

Gene Sequences 
In Biological 
Fluids 

(iii) NUMBER OF SEQUENCES: 20 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Lahive & Cockfield 

(B) STREET: 60 State Street 

(C) CITY: Boston 

(D) STATE: Massachusetts 

(E) COUNTRY: U.S.A. 

(F) ZIP : 02109 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy Disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: ASCII Text 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 27 APR - 1992 

(C) CLASSIFICATION 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: William C. Geary III 

(B) REGISTRATION NUMBER : 31,357 

(C) REFERENCE/DOCKET NUMBER: DCI-037 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE (617) 227-7400 

(B) TELEFAX: (617) 227-5941 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
GTGGTAGTTG GAGCTG . 16 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( i i ) MOLECULE TYPE : DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GTGGTAGTTG GAGCTC 16 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GTGGTAGTTG GAGCTT 



16 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GTGGTAGTTG GAGCTA 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CAGAGAAACC TTTATCTG 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



ACTCTTGCCT ACGCCAC 



17 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ACTCTTGCCT ACGCCAG 17 



(2) INFORMATION FOR SEQ ID NO:" 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ACTCTTGCCT ACGCCAA 17 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



ACTCTTGCCT ACGCCAT 



17 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GTACTGGTGG AGTATTT 17 

(2) INFORMATION' FOR SEQ ID NO: 11 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AGCTCCAACT ACCACAAGTT 20 



(2) INFORMATION FOR SEQ ID NO : 12 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



GCACTCTTGC CTACGCCACC 



20 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
GCACTCTTGC CTACGCCACA 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS:. single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GCACTCTTGC CTACGCCACG 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
GCACTCTTGC CTACGCCACT 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
• (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GGTGGCGTAG GCAAGAGTGC 20 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
AACTTGTGGT AGTTGGAGCT 2 0 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



AACTTGTGGT AGTTGGAGCA 



20 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
AACTTGTGGT AGTTGGAGCC 20 

(2) INFORMATION' FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

v 

AACTTGTGGT AGTTGGAGCG 20 
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Claims : 

1. A method of detecting a mutant allele, 
comprising the steps of: 

providing a sample of a biological 
fluid containing soluble DNA, including a mutant 
allele of interest; 

extracting the DNA from the sample; 

denaturing the DNA to free first and 
second strands of the DNA; 

"amplifying the mutant allele of 
interest in an allele specific manner using at least 
a first set of four allele specific oligonucleotide 
primers having one primer complementary to a 
mutation-containing segment on a first strand of the 
DNA and a first common primer for pairing during 
amplification to each allele specific primer, the 
common primer being complementary to a segment of a 
second strand of the DNA distant with respect to the 
position of the first primer; and 

detecting the presence of the mutant 
allele of interest. 

2. The method of claim 1 further 
comprising the step of removing protein from the 
sample and inactivating any DNase within the sample 
before the step of extracting the DNA. 
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3. The method of claim 2, wherein the 

mutant allele is amplified in an allele specific j 
manner using the polymerase chain reaction (PCR) . 

* 

4. The method of claim 3/ wherein 
following the amplification step, the step, of 
detecting the presence of the mutant allele of 
interest comprises performing an allele specific 
ligase chain reaction (LCR) or a ligase detection 
reaction (LDR) using the amplified product of PCR. 

5. The method of claim 2 wherein protein 
is removed and DNases are inactivated by adding a 
salt solution to the sample and subsequently boiling 
the sample. 

6 . The method of claim 2 wherein the 
biological fluid is selected from the group 
consisting of whole blood, serum, plasma, urine, 
sputum, and cerebral spinal fluid. 

7. The method of claim 2 wherein the 
mutant allele comprises a gene sequence having a 
point mutation at a known location. 

8. The method of claim 7 wherein the first 
DNA strand is the sense strand and the second DNA 
strand is the antisense strand. 

9. The method of claim 2 wherein the step 

of amplifying the mutant allele with the PCR is * 

conducted using a DNA polymerase which lacks the 3* 

exonuclease activity and therefore the ability to + 

repair single nucleotide mismatches at the 3 1 end of 

the primer . ' 



10. The method of claim 9 wherein the DNA 
polymerase is a Thermus aquaticus DNA polymerase. 



11. The method of claim 9 wherein the first 
set of allele specific oligonucleotide primers 
comprises : 

four sense primers, one of which has a 
3' terminal nucleotide complementary to a point 
mutatiin of the sense strand, and the remaining three 
of which are complementary to the wild type sequence 
for the segment to be amplified and to sequences 
having the remaining two possible mutations at the 
mutated point of the sense strand; and 

a common antisense primer complementary 
to a segment of the antisense strand distant from the 
location on the sense strand at which the sense 
primers will anneal/ the common antisense primer 
being paired with each of the sense primers during 
amplification. 

12. 'The method of claim 11 wherein the 3' 
terminal nucleotide of the complementary sense primer 
anneals with the mutated nucleotide of the sense 
strand , 

13. The method of claim 3 wherein the 
mutant allele comprises a gene sequence having a 
point mutation at one of two known locations. 
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14. The method of claim 13 wherein the step 
of amplifying the mutant allele through the PCR 
further comprises the use of a second set of four 
allele specific oligonucleotide primers, in 
conjunction with the first set, wherein the second 
set of allele specific oligonucleotide primers 
comprises: 

four sense primers, one of which has a 
3' terminal nucleotide complementary to a point 
mutation of the sense strand, and the remaining three 
of which are complementary to the wild type sequence 
for the segment to be amplified and sequences having 
the remaining two possible mutations at the mutated 
point of the sense strand; and 

a common antisense primer complementary 
to a segment of the antisense strand distant from the 
location on the sense strand at which the sense 
primers will anneal, the common antisense primer 
being paired with each of the sense primers during 
amplification. 

15. The method of claim 14 wherein the 3' 
terminal nucleotide of the complementary sense primer 
anneals with the mutated nucleotide of the sense 
strand . 

16. The method of claim 15 wherein the 
mutant allele to be detected is the K-ras gene 
sequence having a mutation at position 1 or 2 in the 
twelfth codon. 



17. The method of claim 16 wherein the 
first set of allele specific oligonucleotide primers 
comprises sense primers having the following sequences 



5 1 GTGGTAGTTGGAGCTG 3' (wild type) 
5 ' GTGGTAGTTGGAGCTC 3' 
5 1 GTGGTAGTTGGAGCTT 3' 
5 1 GTGGTAGTTGGAGCTA 3* 

and the common antisense primer having the following 
sequence 

5 ' CAGAGAAACCTTTATCTG 3 ' . 

18. The method of claim 14 wherein the 
second set of allele specific oligonucleotide primers 
comprises sense primers having the following sequences 

5'ACTCTTGCCTACGCCAC 3' (wild type) 
5 ' ACTCTTGCCTACGCCAG 3' 
5 1 ACTCTTGCCTACGCCAT 3' 
5 ' ACTCTTGCCTACGCCAA 3 ' 

and the common antisense primer having the following 
sequence 

5 • GTACTGGTGGAGTATTT 3 1 . 



19. The method of claim 2 wherein the step 
of detecting the presence of amplified DNA is 
conducted by gel electrophoresis in 1-5% agarose gel. 
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The method of claim 2 wherein the 
biological fluid is selected from the group 
consisting of whole blood, serum, plasma,, urine, 
sputum, and cerebral spinal fluid. 

21. A diagnostic kit for detecting the 
presence of a mutated K-ras gene sequence in 
biological fluid, wherein the mutation is present in 
the twelfth codon at position 1, comprising: 

reagents to facilitate the 
deproteinization 'and isolation of DNA; 

reagents to facilitate amplification by 

PGR; 

a heat stable DNA polymerase; and 
a first set of allele specific 

oligonucleotide sense primers having the following 

sequences 

5 ' GTGGTAGTTGGAGCTG 3' 
5 ' GTGGTAGTTGGAGCTC 3' 
5 ' GTGGTAGTTGGAGCTT 3 ' 
5 ' GTGGTAGTTGGAGCTA 3 ' 

and a first common antisense primer having 
the following sequence 

5 1 CAGAGAAACCTTTATCTG '3 

22. The diagnostic kit of claim 21 further 
comprising 

a second set of allele specific 
oligonucleotide sense primers having the following 
sequences 
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5 ' ACTCTTGCCTACGCCAC 3 ' 
5 ' ACTCTTGCCTACGCCAG 3 * 
5 ' ACTCTTGCCTACGCCAT 3 ' 
5 ' ACTCTTGCCTACGCCAA 3 ' 

and a second common antisense primer having 
the following sequence 

5 ' GTACTGGTGGAGTATTT 3' 

wherein the second set of allele specific 
oligonucleotide primers and the second common primer 
are useful in detecting in biological fluid the 
presence of a mutated K-ras gene sequence in the 
twelfth codon at position 2. 
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© Rapid detection of nucleic acid sequences in a sample by labeling the sample. 



© A method for detecting one or more microorganisms or polynucleotide sequences from eukaroytic sources 
in a nucleic acid-containing test sample comprising 

(a) preparing a test sample comprising labeling the nucleic acids in the test sample, 

(b) preparing one or more process by immobilizing a single-stranded nucleic acid of one or more 
known microorganisms or sequences from eukaroytic sources, 

(c) contacting, under hybridization conditions, the labeled single-stranded nucleic acid to form 
hybridized labeled nucleic acids, and 

(d) assaying for the hybridized nucleic acids by detecting the label. The method can be used to 
detect genetic disorders, e.g., sickle-cell anemia. 
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RAPID DETECTION OF NUCLEIC ACID SEQUENCES IN A SAMPLE BY LABELING THE SAMPLE 



BACKGROUND OF THE INVENTION 
Field of the Invention 

5 

The present application relates to the detection and identification of microorganisms and the detection 
and identification of particular prokaryotic or eukaryotic DNA sources in a nucleic acid containing test 
sample. 

Still further, the present invention relates to a method for the lysis of whole cells. 

70 

Background Information 

A. The Detection of Microorganisms 

75 

The identification of species of microorganisms in a sample containing a mixture of microorganisms, by 
immobilizing the DNA from the sample and subjecting it to hybridization with a labelled specimen of 
species -specific DNA from a known microorganism and observing whether hybridization occurs between 
the immobilized DNA and the labelled specimen, has been disclosed in PCT patent application No. 
20 PCT/US83/01029. 

The most efficient and sensitive method of detection of nucleic acids such as DNA after hybridization 
requires radioactively labeled DNA. The use of autoradiography and enzymes prolongs the assay time and 
requires experienced technicians. 

U.S.P. 4,358,535 to Falkow et al describe infectious disease diagnosis using labeled nucleotide probes 
25 complementary to nucleic acid coding for a characteristic pathogen product. 



B. The_ Detection of Specific Eukaryotic Sequences 

30 The identification of specific sequence alteration in an eukaryotic nucleic acid sample by immobilizing 
the DNA from the sample and subjecting it to hybridization with a labeled oligonucleotide and observing 
whether hybridization occurs between the immobilized DNA and the labeled probe, has been described in 
EP -patent application No. 86 117 978 filed December 23, 1986, now pending. 

it is known that the expression of a specific gene determines the physical condition of a human being. 

35 For example, a change in the beta-globin gene coding sequence from GAG to GTG at the sixth amino acid 
position produces sickle-beta-globin and a homozygote can have a disease known as sickle cell anemia 
Similarly deletion of particular sequences from alpha-globin or beta-globin genes can cause thalassemias. A 
recent survey, The New Genetics and Clinical Practice . D.J. Weatherall, The Nuffield Provincial Hospitals 
Trust, (1982), chapter 2 describes the frequency and clinical spectrum of genetic diseases. 

4o Problems associated with genetic defects can be diagnosed by nucleic acid sequence information. The 
easiest way to detect such sequence information is to use the method of hybridization with a specific probe 
of a known sequence. 

U.S.P. 4,395,486 to Wilson et al describe a method for the direct analysis of sickle cell anemia using a 
restriction endonuclease assay. 
45 Edward M. Rubin and Yuet Wai Kan,- "A Simple Sensitive Prenatal Test for Hydrops Fetalis Caused By 
a-Thalassaemia", The L ancet . January 12, 1985, pp. 75-77 describes a dot blot analysis to differentiate 
between the genotypes of homozygous alpha-thaiassemto and those of the haemogiobin-H disease and 
aipha-thalassemia trait. 

The most efficient and sensitive method of detection of nucleic acids, such as DNA, after hybridization 
50 requires radioactively labelled DNA. The use of autoradiography and enzymes prolongs the assay time and 
requires experienced technicians. 

Recently, a non-radioactive method of labelling DNA was described by Ward et al, European Patent 
Application 63,879. Ward et al, use the method of nick translation to introduce biotinylated U (uracil) 
residues into DNA, replacing T (thymine). The biotin residue is then assayed with antibiotin antibody or an 
avidin-containing system. The detection in this case is quicker than autoradiography, but the nick translation 
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method requires highly skilled personnel. Moreover, biotinylation using biotinylated UTP (uridine 
triphosphate) works only for thymine-containing polynucleotides. The use of other nucleoside triphosphates 
is very difficult because the chemical derivatization of A (adenine) or G (guanine) or C (cytosine) (containing 
-NH 2 ) with biotin requires the skills of trained organic chemists. 

5 

C. Cell Lysis 

The present invention also provides a method for the efficient lysis of whole cells such that their DNA is 
to released and made available for photochemical labeling. While eukaryotic cells derived from multicellular 
animals are easily lysed under relatively mild conditions, single cell eukaryotes and prokaryotes, especially 
Gram positive prokaryotes, are more difficult to lyse due to the complicated chemical nature and extent of 
cross-linking of their cell walls. Methods do exist for efficiently lysing these refractory organisms, either by 
chemical-enzymatic or physical means, but these methods are often complicated, time-consuming and 
75 inappropriate for preserving the integrity of DNA. 



SUMMARY OF THE INVENTION 

20 It is accordingly an object of the present invention to provide a method for detection of microorganisms 
in a nucleic acid-containing test sample. 

It is another object of the invention to provide a method for a simultaneous assay for the presence of 
more than one nucleic acid sequence. 

Another object is to provide a method to identify particular prokaryotic or eukaryotic DNA sequences 
25 and a method for distinguishing alleles of individual genes. 

Another object of the invention is to provide a simple photochemical method of labeling the unknown 
test sample. 

A further object of the invention is to label the probes with different kinds of labels so that when the 
probes are hybridized with an immobilized, unknown, unlabelled test sample, the type of label remaining 
30 bound after hybridization and washing, will determine the type of nucleic acid sequence present in the 
unknown sample. 

A still further object of the invention is to use whole chromosomal nucleic acid as the probe and/or as 
the test sample. 

Also the invention relates to the use of oligonucleotides as immobilized probes. 
35 These and other objects and advantages are realized in accordance with the present invention for a 
method of detecting nucleic acid sequences in a nucleic acid-containing test sample. 
The method involves the following: 
(a) preparing a test sample comprising labeling the nucleic acids of the organisms or cells or cell 
debris in the test sample, 

40 (b) preparing one or more probes by immobilizing a single-stranded DNA or an oligonucleotide of 

one or more known microorganisms or eukaryotes, or sequences representing particular genes or their 
alleles, 

(c) contacting, under hybridization conditions, the labeled single-stranded sample nucleic acid and 
the immobilized single-stranded (probe) nucleic acid or the immobilized oligonucleotide to form hybridized 

45 labeled nucleic acids and 

(d) assaying for the hybridized nucleic acids by detecting the label. 
In the above method, steps (a) and (b) can be reversed. 

The method further comprises denaturing the labeled nucleic acids from step (a) to form labeled 
denatured nucleic acids. 

so According to the invention, a labeled nucleic acid test sample is contacted simultaneously with several 
different types of DNA probes for hybridization. The nucleic acid test sample is labeled and hybridized with 
several unlabeled immobilized probes. The positions of the probes are fixed, and the labeled probe 
detected after hybridization will indicate that the test sample carries a nucleic acid sequence complemen- 
tary to the corresponding probe. 

55 Nucleic acid probes for several microbiological systems or for different alleles of one or more genes 
can be immobilized separately on a solid support, for example, nitrocellulose paper. The test sample 
nucleic acids are labeled and remain in solution. The solid material containing the immobilized probe is 
brought in contact with the labeled test nucleic acid solution under hybridization conditions. The solid 
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materia! is washed free of unhybridrzed nucleic acid and the label is assayed. The presence of the label 
with one or more of the probes indicates that the test sample contains nucleic acids substantially 
complementary to those probes and hence originate, for example, from an infection by particular micro- 
biological systems. 

Labeling can be accomplished in a whole living cell or a cell lysate, and can be non-isotopic. The 
nucleic acid can be used for hybridization without further purification. 

The present invention also concerns specific lysis conditions to release nucleic acids from both gram 
positive and gram negative bacteria. 

The present invention further concerns a kit for detecting microorganisms or eukaryotes in a test 
sample comprising 

(a) a support solid containing single-stranded DNA of one or more known microorganisms or 
eukaryotes immobilized thereon, e.g., a strip containing dots or spots of known microorganisms or 
eukaryotes, 

(b) a reagent for labeling the nucleic acid of the test sample, 

(c) a reagent for releasing and denaturing DNA in the test sample, and 

(d) hybridization reagents. 

For chemiluminescence detection of the hybridized nucleic acid, the kit would further comprise a 
reagent for chemiluminescent detection. 

In the above described kit, the reagent for labeling is given hereinbelow in a discussion on labels. 

Reagents for releasing and denaturing DNA include sodium hydroxide and lysing agents such as 
detergents and lysozymes. 

Typical hybridization reagents includes a micture of sodium chloride, sodium citrate, SDS (sodium 
dodecyl sulfate), bovine serum albumin, nonfat milk or dextran sulfate and optionally formamide. 



BRIEF DESCRIPTION OF THE FIGURES 

•Fig. 1 is an autoradiograph of results of immobilization of an oligonucleotide sequence specific for 
hemoglobin mutation. 

Fig. 2 is a photograph of results of hybridization with labeled genomic DNA for non radioactive 
detection. 



DETAILED DESCRIPTION OF THE INVENTION 

The nucleic acid is preferably labeled by means of photochemistry, employing a photoreactive DNA- 
binding furocoumarin or a phenanthridine compound to link the nucleic acid to a label which can be "read" 
or assayed in conventional manner, including fluorescence detection. The end product is thus a labeled 
nucleic acid comprising (a) a nucleic acid component, (b) an intercalator or other DNA-binding ligand 
photochemicaliy linked to the nucleic acid component, and (c) a label chemically linked to (b). 

The photochemical method provides more favorable reaction conditions than the usual chemical 
coupling method for biochemically sensitive substances. The intercalator and label can first be coupled and 
then photoreacted with the nucleic acid, or the nucleic acid can first be photoreacted with the intercalator 
and then coupled to the label. 

A general scheme for coupling a nucleic acid, exemplified by double-stranded DNA, to apply a label, is 
as follows: 
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Where the hybridizable portion of the nucleic acid is in a double stranded form, such portion is then 
denatured to yield a hybridizable single stranded portion. Alternatively, where the labeled DNA comprises 
the hybridizable portion already in single stranded form, such denaturation can be avoided if desired. 
Alternatively, double stranded DNA can be labeled by the approach of the present invention after 
hybridization has occurred using a hybridization format which generates double stranded DNA only in the 
presence of the sequence to be detected. 

To produce specific and efficient photochemical products, it is desirable that the nucleic acid compo- 
nent and the photoreactive intercalator compound be allowed to react in the dark in a specific manner. 

For coupling to DNA, aminomethyl psoralen, aminomethyl angelicin and amino alky! ethidium or 
methidium azldes are particularly useful compounds. They bind to double-stranded DNA and only the 
complex produces photoadduct. In the case where labeled double-stranded DNA must be denatured in 
order to yield a hybridizable single stranded region, conditions are employed so that simultaneous 
interaction of two strands of DNA with a single photoadduct is prevented. It is necessary that the frequency 
of modification along a hybridizable single stranded portion of the nucleic acid not be so great as to 
substantially prevent hybridization, and accordingly there preferably will be not more than one site of 
modification per 25, more usually 50, and preferably 100, nucleotide bases. Angelicin derivatives are 
superior to psoralen compounds for monoadduct formation. If a single-stranded DNA nucleic acid is 
covalently attached to some extra double-stranded DNA, use of phenanthridium and psoralen compounds is 
desirable since these compounds interact specifically to double-stranded DNA in the dark. The chemistry 
for the synthesis of the coupled reagents to modify nucleic acids for labeling, described more fully 
hereinbelow, is similar for all cases. 

The nucleic acid component can be single or double stranded DNA or RNA or fragments thereof such 
as are produced by restriction enzymes or even relatively short oligomers. 

The DNA-binding ligands of the present invention used to link the nucleic acid component to the label 
can be any suitable photoreactive form of known DNA-binding ligands. Particularly preferred DNA-binding 
ligands are intercalator compounds such as the furocoumarins, e.g., angelicin (isopsoralen) or psoralen or 
derivatives thereof which photochemically will react with nucleic acids, e.g., 4'-aminomethyl-4 t 5'-dimethyl 
angelicin, 4'-aminomethyRribxsalen (4'aminomethyl-4,5 y ,8-trimethyl-psoralen), 3-carboxy-5-or -8-amino-or- 
hydroxy-psoraien,. as well as mono-or bis-azido aminoalkyl methidium or ethidium compounds. 

Particularly useful photoreactive forms of intercalating agents are the azidointercalators. Their reactive 
nitrenes are readily generated at long wavelength ultraviolet or visible light and the nitrenes of arylazides 
prefer insertion reactions over their rearrangement products (see White et al, Methods in Enzvmol. , 46, 644 
(1977)). Representative intercalating agents include azidoacridine, ethidium monoazide, ethidium diazide, 
ethidium dimer azide (Mitchell et al, JACS , 104 , 4265 (1982)), 4-azido-7-chloroquinoline, and 2- 
azidofluorene. A specific nucleic acid binding azido compound has been described by Forster et al, Nucleic 
Acid Res. , 13, (1985), 745. The structure of such compound is as follows: 
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Other useful photoreactable intercalators are the furocoumarins which form (2+2) cycloadducts with 
pyrimidine residues. Alkylating agents can also be used such as bis-chloroethylamines and epoxides or 

75 aziridines, e.g., anatoxins, polycyclic hydrocarbon epoxides, mitomycin and norphillin A. 

Nonlimiting examples of intercalator compounds for use in the present invention include acridine dyes, 
phenanthridines, phenazines, furocoumarins, phenothiazines and quinolines. 

The label which is linked to the nucleic acid component according to the present invention can be any 
chemical group or residue having a detectable physical or chemical property, i.e., labeling can be 

20 conducted by chemical reaction or physical adsorption. The label will bear a functional chemical group to 
enable it to be chemically linked to the intercalator compound. Such labeling materials have been well 
developed in the field of immunoassays and in general most any label useful in such methods can be 
applied to the present invention. Particularly useful are enzymatically active groups, such as enzymes (see 
Clin. Chem. , (1976), 22, 1243), enzyme substrates (see British Pat. Spec. 1,548,741), coenzymes (see U.S. 

25 Patent Nos. 4,230,797 and 4,238,565) and enzyme inhibitors (see U.S. Patent No. 4,134,792; fluorescers - 
(see Clin. Chem. , (1979), 25, 353), and chromophores including phycobiliproteins; luminescers such as 
chemiluminescers and bioluminescers (see Clin. Chem. , (1979), 25, 512, and ibid, 1531); specifically 
bindable ligands, i.e., protein binding ligands; and residues comprising radioisotopes such as 3 H, 35 S, *P, 
v/5 l, and U C. Such labels are detected on the basis of their own physical properties (e.g., fluorescers, 

30 chromophores and radioisotopes) or their reactive or binding properties (e.g., enzymes, substrates, 
coenzymes and inhibitors). For example, a cofactor-labeled nucleic acid can be detected by adding the 
enzyme for which the label is a cofactor and a substrate for the enzyme. A hapten or Hgand (e.g., biotin) 
labeled nucleic acid can be detected by adding an antibody or an antibody pigment to the hapten or a 
protein (e.g., avidin) which binds the ligand, tagged with a detectable molecule. An antigen can also be 

35 used as a label. Such detectable molecule can be some molecule with a measurable physical property - 
(e.g., fluorescence or absorbance) or a participant in an enzyme reaction (e.g., see above list). For example, 
one can use an enzyme which acts upon a substrate to generate a product with measurable physical 
property. Examples of the latter include, but are not limited to, beta-galactosidase, alkaline phosphatase, 
papain and peroxidase. For in situ hybridization studies, ideally the final product is water insoluble. Other 

40 labels, e.g., dyes, will be evident to one having ordinary skill in the art. 

The label will be linked to the intercalator compound, e.g., acridine dyes, phenanthridines, phenazines, 
furocoumarins, phenothiazines and quinolines, by direct chemical linkage such as involving covalent bonds, 
or by indirect linkage such as by the incorporation of the label in a microcapsule or liposome which in turn 
is linked to the intercalator compound. Methods by which the label is linked to the intercalator compounds 

45 are essentially known in the art and any convenient method can be used to perform the present invention. 

Advantageously, the intercalator compound is first combined with label chemically and thereafter 
combined with the nucleic acid component. For example, since biotin carries a carboxyl group, it can be 
combined with a furocoumarin by way of amide or ester formation without interfering with the photochemical 
reactivity of the furocoumarin or the biological activity of the biotin, e.g., 
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Other aminomethylangelicin, psoralen and phenanthridium derivatives can be similarly reacted, as can 
phenanthridium halides and derivatives thereof such as aminopropyl methidium chloride, i.e., 
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(see Herteberg et al, Amer. Chem. Soc., 104 , 313 (1982)). 

Alternatively, a Afunctional reagent such as dithiobis succinimidyl propionate or 1,4-butanediol 
diglycidyl ether can be used directly to couple the photochemically reactive molecule with the label where 
the reactants have alkyl amino residues, again in a known manner with regard to solvents, proportions and 
reaction conditions. Certain Afunctional reagents, possibly glutraldehyde may not be suitable because, 
while they couple, they may modify nucleic acid and thus interfere with the assay. Routine precautions can 
be taken to prevent such difficulties. 

The particular sequence used in making the labeled nucleic acid can be varied. Thus, for example, an 
amino-substituted psoralen can first be photochemically coupled with a nucleic acid, the product having 
pendant amino groups by which it can be coupled to the label, i.e., labeling can be carried out by 
photochemically reacting a DNA binding iigand with the nucleic acid in the test sample. Alternatively, the 
psoralen can first be coupled to a label such as an enzyme and then to the nucleic acid. 

As described in pending EP-patent application No. 85 116 199.2, filed December 18, 1985, the present 
invention also encompasses a labeled nucleic acid comprising (a) a nucleic acid component, (b) a nucleic 
acid-binding ligand photochemically linked to the nucleic acid component, (c) a label and (d) a spacer 
chemically linking (b) and (c). 

Advantageously, the spacer includes a chain of up to about 40 atoms, preferably about 2 to 20 atoms, 
selected from the group consisting of carbon, oxygen, nitrogen and sulfur. 

Such spacer may be the polyfunction^ radical of a member selected from the group consisting of 
peptide, hydrocarbon, polyalcohol, polyether, polyamine, polyimine and carbohydrate, e.g., -glycyl-glycyl- 
glycyl-or other oligopeptide, carbonyl dipeptides, and omega-amino-alkane-carbonyl radical such as -NH - 
(CH 2 ) s -CO, a spermine or spermidine radical, an alpha, omega-alkanediamine radical such as -NH-(CH 2 ) 6 - 
NH or -HN-CH 2 -CH 2 -NH, or the like. Sugar, polyethylene oxide radicals, glyceryl, pentaerythritol, and like 
radicals can also serve as the spacers. 

These spacers can be directly linked to the nucleic acid-binding ligand and/or the label or the linkages 
may include a divalent radical of a coupler such as dithiobis succinimidyl propionate, 1,4-butanediol 
diglycidyl ether, a diisocyanate, carbodiimide, glyoxal, glutaraldehyde. or the like. 

The spacer can be incorporated at any stage of the process of making the probe. 

a-b-d-c 

defined hereinabove. Thus, the sequence can be any of the following: 
a + b + d+c, 
b + d + c + a, 
d+c + b + a, 
b + d + a + c, etc. 

The conditions for the individual steps are well known in chemistry. 

If the label is an enzyme, for example, the product will ultimately be placed on a suitable medium and 
the extent of catalysis will be determined. Thus, if the enzyme is a phosphatase, the medium could contain 
nitrophenyl phosphate and one would monitor the amount of nitrophenol generated by observing the color. 
If the enzyme is a beta-galactosidase, the medium can contain o-nitro-phenyl-D-gaiacto-pyranoside which 
also will liberate nitrophenol. 
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The labeled nucleic acid of the present invention is applicable to all conventional hybridization assay 
formats, and In general to any format that is possible based on formation of a hybridization product or 
aggregate comprising the labelled nucleic acid. In particular, the unique labelled nucleic acid of the present 
invention can be used in solution and solid-phase hybridization formats, including, in the latter case, formats 

5 involving immobilization of either sample or probe nucleic acids and sandwich formats. 

The nucleic acid probe will comprise at least one single-stranded base sequence substantially 
complementary to or homologous with the sequence to be detected. However, such base sequence need 
not be a single continuous polynucleotide segment, but can be comprised of two or more individual 
segments interrupted by nonhomologous sequences. These nonhomologous sequences can be linear or 

10 they can be self-complementary and form hairpin loops. In addition, the homologous region of the probe 
can be flanked at the 3' -and 5' termini by nonhomologous sequences, such as those comprising the DNA 
or RNA or a vector into which the homologous sequence had been inserted for propagation. In either 
instance, the probe as presented as an analytical reagent will exhibit detectable hybridization at one or 
more points with sample nucleic acids of interest. Linear or circular single-stranded polynucleotides can be 

is used as the probe element, with major or minor portions being duplexed with a complementary poly- 
nucleotide strand or strands, provided that the critical homologous segment or segments are in single- 
stranded form and available for hybridization with sample DNA or RNA. Useful probes include linear or 
circular probes wherein the homologous probe sequence is in essentially only single-stranded form (see 
particularly, Hu and Messing, Gene , 17:271 (1982)). 

20 The nucleic acid probe of the present invention can be used in any conventional hybridization 
technique. As improvements are made and conceptually new formats are developed, such can be readily 
applied to the present probes. Conventional hybridization formats which are particularly useful include those 
wherein the sample nucleic acids or the polynucleotide probe is immobilized on a solid support (solid-phase 
hybridization) and those wherein the polynucleotide species are all in solution (solution hybridization). 

25 In solid-phase hybridization formats, one of the polynucleotide species participating in hybridization is 
fixed in an appropriate manner in its single-stranded form to a solid support. Useful solid supports are well 
known in the art and include those which bind nucleic acids either covalently or noncovalently. Noncovalent 
supports which are generally understood to involve hydrophobic bonding include naturally occurring and 
synthetic polymeric materials, such as nitrocellulose, derivatized nylon and fluorinated polyhydrocarbons, in 

30 a variety of forms such as filters, beads or solid sheets. Covalent binding supports (In the form of filters, 
beads or solid sheets, just to mention a few) are also useful and comprise materials having chemically 
reactive groups or groups, such as dichlorotriazine, diazobenzyloxymethyl, and the like, which can be 
activated for binding to polynucleotides. 

It is well known that noncovalent immobilization of an oligonucleotide is ineffective on a solid support, 

35 for example, on nitrocellulose paper. The present invention also describes novel methods of oligonucleotide 
immobilization. This is achieved by phosphorylation of an oligonucleotide by a polynucleotide kinase or by 
ligation of the 5'-phosphorylated oligonucleotide to produce multi-oligonucleotide molecules capable of 
immobilization. The conditions for kinase and ligation reaction have been described in standard text books, 
e.g., Molecular Cloning , T. Maniatis, E.F. Fritsch and J. Sambrook, Cold Spring Harbor Laboratory, (1982), 

40 pages 1-123. 

A typical solid-phase hybridization technique begins with immobilization of sample nucleic acids onto 
the support in single-stranded form. This initial step essentially prevents reannealing of complementary 
strands from the sample and can be used as a means for concentrating sample material on the support for 
enhanced detectability. The polynucleotide probe is then contacted with the support and hybridization 
45 detected by measurement of the label as described herein. The solid support provides a convenient means 
for separating labelled probe which has hybridized to the sequence to be detected from that which has not 
hybridized. 

Another method of interest is the sandwich hybridization technique wherein one of two mutually 
exclusive fragments of the homologous sequence of the probe is immobilized and the other is labelled. The 
so presence of the polynucleotide sequence of interest results in dual hybridization to the immobilized and 
labelled probe segments. See Methods in Enzvmoloqy , 65:468 (1980) and Gene . 21:77-85 (1983) for 
further details. 

For the present invention, the immobile phase of the hybridization system can be a series or matrix of 
spots of known kinds and/or dilutions of denatured DNA. This rs most simply prepared by pipetting 
55 appropriate small volumes of native DNA onto a dry nitrocellulose or nylon sheet, floating the sheet on a 
sodium hydroxide solution to denature the DNA, rinsing the sheet in a neutralizing solution, then baking the 
sheet to fix the DNA. Before DNA:DNA hybridization, the sheet is usually treated with a solution that inhibits 
non-specific binding of added DNA during hybridization. 
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This invention involves the labeling of whole genomic DNA, whole nucleic acids present in cells, whole 
cell lysate, or unlysed whole cells. Once the labeled material is prepared, it can be used for the detection, 
i.e., the presence or absence of certain specific genomic sequences by specific nucleic acid hybridization 
assays. 

5 One method according to the invention involves the separation of cells from a human sample or the 
human sample directly is treated by mixing with a photochemically reactive nucleic acid binding inter- 
calating ligand. The mixture is incubated depending on the type of the sample. If the sample is lysed cells 
or nucleic acids, it is incubated for a period between a few seconds to five minutes and when whole cells or 
partially lysed cells are used, incubation between two minutes to two hours is employed. After the mixing 

to and incubation, the whole sample mixture is irradiated at a particular wavelength for the covalent interaction 
between the photochemically reactive DNA binding ligand and the test sample. Then this labeled material is 
hybridized under specific hybridization conditions with a specific probe. 

After the hybridization, the unreacted unhybridized labeled test sample is removed by washing. After 
the washing, the hybrid is detected through the label carried by the test sample, which is specifically 

15 hybridized with a specific probe. 

The present invention is surprising since in a human genomic sample the amount of a single copy gene 
is very low, for example, if a restriction fragment of one thousand base pair is the region of hybridization, 
the probability of such sequence in the whole human genomic sample is one in a million. This conclusion 
has been derived by assuming from the literature that a human genomic sample has 3 x 10' base pairs and 

20 1000 base pairs will be 1/3,000,000 of that number. Automatically, in a sample of human DNA containing 
approximately five to ten micrograms of nucleic acids, only 5 to 10 picogram of the corresponding 
sequences is available and labeling the vast majority of the non-specific DNA should produce more 
background than the true signal. But after the reaction, it is surprising to observe that the results are not 
only specific, but also of unexpected higher sensitivity. 

25 Without wishing to be bound by any particular theory of operability, the reason for the unexpected 
sensitivity may be due to the formation of a network of non-specific nucleic acid hybrids bound to the 
specific hybrid, thus amplifying the amount of the signal. As has been shown in a typical example, a 19 
nucleotide long specific sequence containing plasmid is immobilized and hybridized with 5 microgram 
equivalent of a test sample which is labeled photochemically and one detects very easily the signal resulted 

30 from such hybrid. This could not have been accomplished by any other technique because of the problems 
associated with the labeling method. 

The present invention relates to a novel hybridization technique where probes are immobilized and an 
eukaryotic nucleic acid sample is labeled and hybridized with immobilized unlabeled probe. A surprising 
characteristic of the invention is the ability to detect simple or multiple copy gene defects by labeling the 

35 test sample. Since there is no requirement for an excess of labeled hybridizing sequence, the present 
method is more specific. In the present invention, simultaneous detection of different gene defects can be 
easily carried out by immobilizing specific probes. 

For example, using the present invention, one can immobilize oligonucleotide probes specific for 
genetic defects related to sickle cell anemia and probes for alpha-thalassemias on a sheet of nitrocellulose 

40 paper, label the test sample and hybridize the labeled test sample with the immobilized probes. It is 
surprising that partially purified or unpurified nucleic acid samples (cell lysate or whole cell) can be 
photochemically labeled with sensitive molecules without affecting the specific hybridizability. 

The present invention is also directed to detecting eukaroytes (protists) in samples from higher 
organisms, such as animals or humans. 

45 Eukaroytes include algae, protozoa, fungi and slime molds. 

The term "algae" refers in general to chlorophyll-containing protists, descriptions of which are found in 
G.M. Smith, Cryptooamic Botany , 2nd ed. Vol. 1 , Algae and Fungi , McGraw-Hill, (1955). 

Eukaryotic sequences according to the present invention includes all disease sequences except for 
bacteria and viruses. Accordingly, genetic diseases, for example, would also be embraced by the present 

so invention. Non-limiting examples of ?.ich genetic diseases are as follows: 
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Area Affected 
Metabolism 



Diseases 

Acute intermittent porphyria 

Variegate porphyria 

alpha^antitrypsin deficiency 

Cystic fibrosis 

Phenylketonuria 

Tay-Sachs disease 

Mucopolysaccharidosis I 

Mucopolysaccharidosis II 

Galactosaemia 

Homocystinuria 

Cystinuria 

Metachromic leucodystrophy 



Nervous System 



Huntington's chorea 
Neurofibromatosis 
Myotonic dystrophy 
Tuberous sclerosis 
Neurogenic muscular atrophies 



Blood 



Sickle-cell anaemia 
Beta-thalassaemia 
Congenital spherocytosis 
Haemophilia A 



Bowel 
Kidney 



Polyposis coli 



Polycystic disease 
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Eyes 



Dominant blindness 
^Retinoblastoma 



Ears 



Dominant early childhood deafness 
Dominant otosclerosis 



Circulation 



Monogenic hypercholesterolemia 



Blood 



Congenital spherocytosis 



Teeth 



Dentinogenisis imperfecta 
Amelogenisis imperfecta 



Skeleton 



Skin 



Diaphysial aclasia 
Thanatophoric dwarfism 
Osteogenes imperfecta 
Marfan syndrome 
Achondroplas ia 
Ehlers-Danlos syndrome 
Osteopetrosis tarda 
Cleft lip/palate 
Ichthyosis 



Locomotor Muscular dystrophy 

A nucleic acid probe in accordance with the present invention is a sequence which can determine the 
sequence of a test sample. The probes are usually DNA, RNA, mixed copolymers of ribo-and deox- 
yribonucleic acids, oligonucleotides containing ribonucleotides or deoxyribonucieotide residues or their 
modified forms. The sequence of such a probe should be complementary to the test sequence. The extent 
of complementary properties will determine the stability of the double helix formed after hybridization. The 
probe can also have covalently linked non-complementary nucleic acids. They can serve as the sites of the 
labeling reaction. 

The nucleic acid is preferably labeled by means of photochemistry, employing a photoreactive DNA- 
binding furocoumarin or a phenanthridine compound to link the nucleic acid to a label which can be "read" 
or assayed in conventional manner, including fluorescence detection. 

One use of the present invention is the identification of bacterial species in biological fluids. In one 
application, samples of urine from subjects having or suspected of having urinary tract infections can 
provide material for the prep*- ition of labelled DNA(s) or RNAs, while a solid support strip, e.g., made of 
nitrocellulose or nylon, can contain individual dots or spots of known amounts of denatured purified DNA 
from each of the several bacteria likely to be responsible for infection. 
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The format of labeled unknown and unlabeled probes, which is the converse of standard schemes, 
allows one to identify among a number of possibilities the. species of organism in a sample with only a 
single labeling. It also allows simultaneous determination of the presence of more than one distinguishable 
bacterial species in a sample (assuming no DNA in a mixture is discriminated against in the labeling 
5 procedure). However, it does not allow in a simple way, better than an estimate of the amount of DNA (and, 
therefore, the concentration of bacteria) in a mixed sample. For such quantitation, sample DNA is 
immobilized in a series of dilution spots along with spots of standard DNA, and probe DNAs are labeled. 

A urinary tract infection is almost always due to monoclonal growth of one of the following half dozen 
kinds of microorganism: Escherichia coii (60-90% of UTI), Proteus spp. (5-20% of UTI), Klebsiella spp (3- 
io 10% of UTI), Staphylococcus spp. (4-20% of UTI), Streptococcus spp. (2-5% of UTI). Pseudomonas and 
some other gram negative rods together account for a low percentage of UTI. A common contaminant of 
urine samples that is a marker of improper sample collection is Lactobacillus . 

The concentration of bacteria in a urine sample that defines an infection is about 10 s per milliliter. 
The format for an unlabeled probe hybridization system applicable to urinary tract infections is to have 
75 a matrix of DNAs from the above list of species, denatured and immobilized on a support such as 
nitrocellulose, and in a range of amounts appropriate for concentrations of bacterial DNAs that can be 
expected in samples of labelled unknown. 

Standard hybridization with biotinylated whole genome DNA probes takes place in 5-10 ml, at a probe 
concentration of about 0.1 ug/ml; hybridization of probe to a spot containing about 10 ng denatured DNA is 
20 readily detectable. There is about 5 fg of DNA per bacterial cell, so that for a sample to contain 1 ug of 
labelled DNA, it is necessary to collect 2 x 10 8 bacteria. If an infection produces urine having approximately 
10 s bacteria/ml, then bacterial DNA to be labeled from a sample is concentrated from 2000 ml. If more than 
10 ng unlabeled probe DNA is immobilized in a dot, for example, 100 ng or 1 ng, or if the hybridization 
volume is reduced, then the volume of urine required for the preparation of labeled unknown is approxi- 
25 mately a few tenths of a ml. 

A strip of dots containing immobilized, denatured, unlabelled probe DNAs could have the following 
configuration: 
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This procedure involves the labeling of DNA or RNA in a crude cell lysate. Ideally, preparation of 
labeled sample DNA or RNA will accommodate the following points: 
45 (1) bacteria will be concentrated from a fluid sample by centrifugation or filtration; 

(2) bacteria will be lysed under conditions sufficient to release nucleic acids from the most refractory 
of the organisms of interest; 

(3) the labeling protocol will not require purification of labeled nucleic acids from unincorporated 
precursors, nor the purification of nucleic acids prior to labeling; 

so (4) the labeling protocol will be sufficiently specific for DNA and/or RNA that proteins, lipid;/ and 

polysaccharides in the preparation will not interfere with hybridization nor read-out. 

In the present invention, there is provided a method for efficiently and rapidly lysing whole cells, 
including Gram positive bacteria. The method involves contacting cells, e.g., whole cells, with an alkali, e.g., 
sodium or potassium hydroxide solution in a concentration of 0.1 to 1.6 Normal. 

55 The important features of the present lysis protocol are its relative simplicity and speed. It employs a 
common chemical that requires no special storage conditions and it lyses even Gram positive organisms 
with high efficiency, while preserving the properties of the DNA that are important for subsequent steps in 
the photochemical labeling process. 
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For the present invention, the immobile phase of the hybridization system can be a series or matrix of 
spots of known kinds and/or dilutions of denatured DNA. This is most simply prepared by pipetting 
appropriate small volumes of native DNA or oligonucleotides onto a dry nitrocellulose or nylon sheet, 
floating the sheet on a sodium hydroxide solution to denature the DNA, rinsing the sheet in a neutralizing 
5 solution, then baking the sheet to fix the DNA. Before DNA:DNA hybridization, the sheet is usually treated 
with a solution that inhibits non specific binding of added DNA during hybridization. 

The invention will be further described in the following non-limiting examples wherein parts are by 
weight unless otherwise expressed. 

70 

Examples 

Example 1: Preparation of Labelling Compound 

75 The preparation of the labeling compound required 1-amincM7-N-(Biotinylamido)-3,6,9,12,15 pentaox- 
aheptadecane. This compound was prepared in the following four steps: 

(a) 3,6,9,12,15 pentaoxapheptadecane 1,17-diol difosylate was synthesized. 

(b) 1,17-dipthalimido derivative of 3,6,9,12,15 pentaoxaheptadecane was prepared. 

(c) 1,17-diamino derivative of 3,6,9,12,15 pentaoxaheptadecane was prepared. 

20 (d) 1-amino, 17-biotinylamido derivative of 3,6,9,12,15 pentaoxaheptadecane was prepared. 

Example 1(a) : Preparation of 3,6,9,1 2,1 5-Pentaoxaheptadecane-1,17-diol Ditosvlate 

25 To a stirred solution containing 50 g of hexaethylene glycol (0.177 mol) and 64 ml of triethylamine - 
(39.36 g, 0.389 mol) in 400 ml of CH a CI 2 at 0°C was added dropwise a solution containing 73.91 g of p- 
toluenesulfonyl chloride (0.389 mol) in 400 ml of CH 2 CI 2 over a 2.5 hour period. The reaction mixture was 
then stirred for one hour at 0°C and then heated to ambient temperature for 44 hours. The mixture was then 
filtered and the filtrate was concentrated in vacuo . The resulting heterogeneous residue was suspended in 

30 500 ml of ethyl acetate and filtered. The filtrate was then concentrated in vacuo to a yellow oil which was 
triturated eight times with 250 ml portions of warm hexane to remove unreacted p-toluenesulfonyl chloride. 
The resulting oil was then concentrated under high vacuum to yield 108.12 g of a yellow oil (quantitative 
yield). 

Analysis: Calculated for C^H^CS* 
35 Calc: C, 52.87; H, 6.48. 
found: C, 52.56; H, 6.39. 

PMR: (60 MHz, CDCI 3 ) 6 : 2.45 (s, 6H); 3.4-3.8 (m, 20H); 4.2 (m, 4H); 7.8 (AB quartet, J = 8Hz, 8H). 
IR: (neat) cm" 1 : 2870, 1610, 1360, 1185, 1105, 1020, 930, 830, 785, 670. 

40 

Example Kb) : Preparation of 1,17-Diphthalimido-3,6,9.12,15-pentaoxaheptadecane 

A stirred suspension containing 108 g of 3,6,9,12,15-pentaoxaheptadecane-1,17-diol ditosylate (0.183 
mol), 74-57 g of potassium phthalimide (0.403 mol), and 700 ml of dimethylacetamide was heated at 160- 
45 170°C for 2 hours and was then cooled to room temperature. The precipitate was filtered and washed with 
water and acetone to yield 53.05 g of product as a white powder which was dried at 55°C (0.1 mm), mp 
124-126°C. 

A second crop of product was obtained from the dimethylacetamide filtrate by evaporation in vacuo and 
the resulting precipitate with was successively washed ethyl acetate, water, and acetone. The resulting 
so white powder was dried it 55°C (0.1 mm) to yield an additional 9.7 g of product mg 124.5-1 26.5 6 C. The 
combined yield of product was 62.82 g (68% yield). 
Analysis: (For first crop) 
Calculated for C^HJ^CU^HjO 
Calc: C, 61.19; H, 6.05; N. 5.09. 
55 found: C, 61.08; H. 6.15; N, 5.05. 
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(For second crop) 
Calculated for C 29 H 32 N 2 0 9 
Calc.: C, 62.21; H, 5.97; N, 5,18. 
found: C, 6178; H, 6.15; N, 5.13. 
5 PMR: (60 MHz, dmso-d 6 ) 5 : 3.5 (s, 8H); 3.6 (s, 8H); 3.8 (bt, J = 3Hz, 8H): 8.1 (s, 8H). 
IR: (KBr) cm" 1 : 2890, 1785, 1730, 1400, 1100, 735. 



Example 1(c) : Preparation of 1 t 17-Diamino-3,6,9,12,1 5-Pentaoxaheptadecane 

70 

A solution containing 60 g of 1,17-diphthalimido-3,6,9,12,15-pentaoxaheptadecane (0.118 mol), 14.8 g of 
hydrazine hydrate (0.296 mol), and 500 ml of ethanol were heated with mechanical stirring in a 100°C oil 
bath for three hours. The mixture was then cooled and filtered. The resultant filter cake was washed four 
times with 300 ml portions of ethanol. The combined filtrates were concentrated to yield 32.35 g of a yellow 
rs apaque glassy oil. The evaporative distillation at 150-200°C (0.01 mm) gave 22.82 g of a light yellow oil • 
(69% yield). lit. b.p. 175-1 77°C (0.07 mm). 

PMR: (60 MHz, CDCI 3 ) 5 : 1.77 (s, 4H, NH 2 ); 2.85 (t, J = 5Hz, 4H); 3.53 (t, J = 5Hz, 4H); 3.67 (m, 16H). 
IR: (CHCI,) cm- 1 : 3640, 3360, 2860, 1640, 1585, 1460, 1350, 1250, 1100, 945, 920, 870. 
Mass Spectrum: (El) m/e = 281.2 (0.1%, M + 1). 
20 (FAB) m/e = 281 .2 (1 00%, M + 1), 
Analysis: For C«H a NaO s .1/2 H 2 0 
Calc: C, 49.80, H, 10.10; N, 9.68. 
found: C, 50.36, H, 9.58: N, 9.38. 

Literature Reference: W. Kern, S. Iwabachi, H. Sato and V. Bohmer, Makrol. Chem. . 180 , 2539 (1979). 



Example 1(d) : Preparation of 1-AminO"17-N-(BiotinvlamidoV3,6 t 9,12J5-pentaoxaheptadecane 

A solution containing 7.2 g of 1,17-diamino-3 ? 6,9,12,15-pentaoxaheptadecane (25 mmol) in 75 ml of 
30 DMF under an argon atmosphere was treated with 3.41 g of N-succinimidyl biotin (10 mmol) added in 
portions over 1.0 hour. The resulting solution was stirred for four hours at ambient temperature. TLC (Si0 2 , 
70:10.1 CHCL 3 -CH 3 OH-conc. NH* OH) visualized by dimethylaminocinnamaldehyde spray reagent showed 
excellent conversion to a new product (Rf -0.18). The reaction mixture was divided in half and each half 
was absorbed onto Si0 2 and flash-chromatographed on 500 g of SiO 2 -60 (230-400 mesh) using a 70:10.1 
35 CHCI 3 -CH 3 OH-conc. NH 4 OH solvent mixture. Fractions containing the product were polled and concentrated 
to a yield 2.42 g of a gelatinous, waxy solid. The product was precipitated as a solid from isopropanol-ether, 
washed with hexane, and dried at 55°C (0.1 mm) to give 1.761 g of a white powder (35% yield). 
Analysis: Calculated for C 22 H< 2 N 4 07S.3/2 H 2 0: 
C, 49.51; H, 8.50; N. 10.49. 
40 found: C, 49.59; H, 8.13; N, 10.39. 

PMR: (90 MHz, dmso-d 6 ) 5 : 1.1-1.7 (m, 6H); 2.05 (t, J = 7Hz, 2H); ' 
2.62 (t, J = 4Hz, 1H); 2.74 (t, J = 4Hz, 1H); 3.0-3.4 (m, 14H). 
3.50 (s, 14H); 4.14 (m, 1H); 4.30 (m, 1H); 6.35 (d t J = 4Hz, 1H); 7.80 (m, 1H). 

CMR: (22.5 MHz, dmso-d c ) a : 25.2, 28.0, 28.2, 35.1, 40.6, 55.3, 59.2, 61.1, 69.6, 69.8, 71,2, 162.7, 
45 172.1. 

IR: (KBr) cm" 1 : 2900, 2850, 1690, 1640, 1580, 1540, 1450, 1100. 
Mass Spectrum (FAB) m/e: 507.3 (M + 1, 56%) 



so Example 2: Preparation of 4'-Biotinyl-PEG-4,5'-dimethvlanqelicin - 

A solution of 203 mg of 1-amino-17-N-(biotinylamido)-3,6,9.12,15-pentaoxaheptadecane (0.4 mmol) in 1 
ml of DMF under an argon atmosphere was treated with 78 mg of N,N-carbonyldimidazole (0.48 mmol). The 
resulting mixture was stirred for four hours and was then treated with 55 mg of 4'-aminomethyh4,5'dimeth- 
55 ylingelicin hydrochloride (0.2 mmol), 140 ul of diisopropylethyiamine, and 100 jil of DMF. The resulting 
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mixture was stirred overnight at 50 °C. The mixture was then evaporated onto Si0 2 in vacuo and the 
. resultant impregnated solid flash was chromatographed on 60 g of Si0 2 (230-400 mosh) eluted with 1.5 

liters of 7% CH,-CHCI 3 followed by 1 liter of 10% CH 3 0H-CHCI 3 . Fractions containing the product were 

pooled and concentrated to yield 72 mg of a glassy solid (47% yield). 
5 PMR: (90 MHz, dmso-d 6 ): 8 1.1-1.8 (m, 6H); 2.04 (bt, J = 7Hz, 2H); 2.5 (s, 6H); 2.56 (m, 1H); 2.74 (bd, 

J=4Hz, 1H); 2.8-3.4 (m, 14H); 3.40 (m, 14H); 4.14 (m t 1H); 4.25 (m, 1H); 4.40 (bd, J = 6Hz, 2H); 6.5 (m ( 1H); 

6.35 (s, 1 H); 7.02 (s, 1 H); 7.45 (d, J = 8Hz, 1 H); 7.62 (d, J = 8Hz, 1 H); 7.80 (m, 1 H). 

CMR: (22.5 MHz, dmso-d fi ) 5 : 11.9, 18.9, 25.3, 28.2 28.3, 33.4, 352, 55.4, 592, 61.0, 69.2, 69.6, 69.8, 

70.0, 89.0, 107.8, 112.0, 113.1, 114.3, 120.6, 121.6, 153.6, 154.4, 155.6, 157.9, 159.5, 162.7, 172.1. 
10 Literature Reference: F. Dall'Acqua, D. Vedaldi, S. Caffieri, A. Guiotto, P. Rodighiero, F. Baccichetti, F. 

Carlassare and F. Bordin, J. Med. Chem. , 24, 178 (1981). 



Example 3 : Colorimetric or Chemiluminescent Detection of the Nucleic Acid Hybrids 

15 

Example 3fa) : Colorimetric Detection 

Colorimetric detection of the biotinylated hybrids is carried out following the procedure and kit 
developed by Bethesda Research Laboratories (BRL), Gaithersburg, Maryland 20877, U.SA The procedure 
20 is described in detail in a manual supplied with a kit by BRL, entitled "Products for Nucleic Acid Detection", 
"DNA Detection System Instruction Manual", Catalogue No. 8239SA. 



Example 3(b) : Chemiluminescent Defection 

25 

Chemiluminescent detection of the biotinylated hybrids is identical to the above method: the fitters with 
the hybrids are saturated with BSA (bovine serum albumin) by immersing the paper in 3% BSA at 42°C for 
20 minutes. Excess BSA is removed by taking the paper out of the container, and blotting it between two 
pieces of filter paper. The paper is then incubated in a solution containing Streptavidin (0.25 mg/ml, 3.0 mi 

30 total volume), for 20 minutes at room temperature. It is then washed three times with a buffer containing 0.1 
M Tris-HCI, pH 7.5, 0.1 M NaCI, 2mM MgCl 2> 0.05% "TRITON X-100". Next the filter is incubated with 
biotinylated horseradish peroxidase (0.10 mg/ml) for 15 minutes at room temperature. This is followed by 
three washings with 0.1 M Tris-HCI, pH 7.5, 0.1 M NaCI, 2 mM MgCI* and 0.05% Triton X-100, and one 
washing with 10 mM Tris (pH 8.0) buffer. Chemiluminescent activation is conducted in two ways. (1) Spots 

35 are punched out and the discs containing the DNA are placed in a microtiter plate with wells that are 
painted black on the sides. After the punched paper circles are placed in the microliter place wells, 0.8 ml 
buffer containing 40 mM Tris and 40 mM ammonium acetate (pH 8.1) is added to each well. Then 10 u! of 
1:1 mixture of 39 mM Luminol (in DMF) and 30 mM HA (in water) is added. Light emission is recorded on 
a "POLAROID" instant film by exposing it directly in the film holder. Alternatively (2), the paper is soaked in 

40 a solution containing 1:1 mixture of 0.5 mM Luminol and H 2 0 2 and wrapped with a transparent "SARAN 
WRAP". The light emission is recorded on a " POLAROID" film as above. 



Example 4: General Method of Labeling the Test Sample Nucleic Acids 

45 

High molecular weight DNA from a patient's sample is isolated by a method described in U.S.P. 
4,395,486 (Wilson et al), the entire contents of which are incorporated by reference herein. The nucleic acid 
is dissolved in 10 mM borate buffer (pH 8.0) to a final concentration of approximately 20 ug/ml. To the 
nucleic acid solution "angelicin-peg-biotin" in aqueous solution is added to a final concentration of 10 

so ug/ml. The mixtun. is then irradiated at long wavelength irradiation for about 60 minutes using a black ray 
UVL 56 lamp. The product is ready for hybridization without purification. However, the product can be 
purified by dialysis or alcohol precipitation (U.S.P. 4,395,486) as is usually followed for nucleic acids. 

instead of nucleic acids, whole cell lysate can also be labeled following an identical procedure. The 
lysis is conducted by boiling the cells with 0.1 N sodium hydroxide, followed by neutralization with 

55 hydrochloric acid. 
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When whole cells are used, the mixture of "PEG-ang-bio" and cells are incubated for at least 60 
minutes prior to irradiation for efficient transport of the ligands. Many different variations of the above 
described methods can be adopted for labeling. 

5 

Example 5: 

Alpha-thalassemia is associated with gene deletion. The detection of gene deletion by hybridization in a 
dot/slot blot format requires that the total amount of sample and its hybridizabiiity are accurately known. 
70 Since the beta-globin gene is a single copy gene, simultaneous hybridization of a sample with beta-globin 
and alpha-globin and their relative amounts will indicate the amount of alpha-globin with the sample. 

The format and hybridization conditions are the same as Rubin and Kan, supra , except probes, not test 
DNA, is immobilized. Hybridization conditions are also similar. The detection is done by using the BRL kit 
described supra following BRL's specifications. 
75 The hybridization detection process are conducted in three steps as follows: 



Step 1 : Immobilization of the Probes 

20 As described in Rubin and Kan, supra , 1 .5 kb Pstl fragment containing alpha, globin gene is used as a 
probe for alpha-thalassemia and for the beta-globin gene a 737 base pair probe produced by the digestion 
of pBR beta Pst (4.4 kb) is used. The beta-globin gene probe has been described in U.S. P. 4,395,486 
(column 4). For the detection of gene deletion related to alpha-thalassemia, the amount of starting nucleic 
acid, hybridization efficiency and control samples are needed. The present invention avoids these problems 

25 by simultaneous hybridization with a single copy essential gene (e.g.,. beta-globin gene) when similar 
amounts of probes are immobilized side by side, labeled sample is hybridized, relative strength of signal 
intensity is a measure of relative amount of gene dosage present in the sample. 

The probes (0.5, 1, 3 and 5 tig per 100 ul) are suspended in 10 mM tris HCI (pH 7) buffer, denatured 
with 20 jul] 3 M sodium hydroxide, at 100°C, for 5 minutes, an equivalent volume of 2 M" ammonium 

oo acetate, pH 5.0 is added to neutralize the solution, immediately after neutralization the probes for beta-and 
alpha-globin genes are applied in parallel rows to nitrocellulose filter paper under vacuum in a slot blot 
manifold, purchased from Scleicher and Schuell, (Keeni, New Hampshire, U.S.A.). The filter is then dried in 
vacuum at 80°C for 60 minutes. It is then prehybridized for 4 hours in a mixture containing 50 mM sodium 
phosphate (pH 7) 45 mM sodium citrate, 450 mM sodium chloride, 50% (v/v) formamide, 0.2% each (w/v) 

35 of polyvinyl pyrrolidine, "FICOLL 400" and bovine serum albumin and 0.2 mg/ml alkali boiled salmon sperm 
DNA and 0.15 mg/ml yeast RNA. 



Step 2: Labeling of the Test Sample 

40 

This was described above. 



Step 3: Hybridization 

45 

The nitrocellulose strip containing the immobilized probes are hybridized with the labeled test sample in 
plastic bags (e.g., "SEAL-A-MEAL", "SEAL and SAVE", etc.). Hybridization solution is the same as 
prehybridization solution plus 10% dextran sulphate. Hybridization is done at 42°C for 16 hours. After 
hybridization detection of biotin is conducted with a kit and procedure supplied by Bethesda Research 
50 Laboratory, Maryland,. U.S.A., (catalogue No. 8239SA). Results of relative intensity of alpha-and beta-regions 
are used to estimate the extent of deletion of alpha-globin genes: 

No signal on the alpha-globin side: all 4 alpha-globin genes missing. 

Signal on the alpha-globin side is half as strong as on the corresponding beta-side: 3 alpha-globin 
genes missing. 

55 Signals on alpha and beta side equivalent: 2 alpha-globin genes missing. 
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Signals on alpha side is stronger than the corresponding beta side (2 alpha = 3 beta): 1 alpha-globin 
gene missing. 



Example 6: Immobilization of an Oligonucleotide Sequence Specific for Hemoglobin Mutation 



10 



15 



20 



It is known that an oligonucleotide cannot be easily immobilized onto nitrocellulose paper by a simple 
adsorption process. The present invention encompasses three different methods to incorporate an 
oligonucleotide sequence into a larger molecule capable of adsorption. 

Method 1: Two oligonucleotides, one a 43mer and the other a 16-mer, have been chemically 
synthesized in an automated synthesizer (Applied Biosystem 380B) by the phosphoramidite-method and 
phosphorylated at the 5' end by a T4~polynucleotide kinase mediated process according to Maniatis et al, 
Molecular Cloning , page 122. These oligonucleotides contain a segment of a 19 nucleotide long sequence 
specific for the detection of the mutation associated with sickle cell anemia. 

43mer A & S (A = normal globin gene; S = sickle globin gene) were kinased according to Maniatis et 
al, Molecular Cloning , page 122, in two separate reactions, namely, one with ^P-ATP and one with no 
radioactive label. 0.4 ug B P-43mer and 0.6 mg cold 43mer were mixed and purified on a spun column (G- 
25med in TE (Tris EDTA buffer)) to a final volume of 40 ill Two dilutions were spotted on S & S 
(Schleicher & Schuell) nitrocellulose and nytran (nylon) membranes at 50 and 0.5 ng. 

Method 2: The phosphorylated oligonucleotide products of method 1 were further elongated by making 
multimers of sequences by a ligase mediated process. The principle is described as follows: 



25 



(X) 



30 



35 



5' 3' 

ligase 



(X) 



40 



45 



strand separation 



50 



55 



(X) 
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The product being of a higher molecular weight than an oligonucleotide it should be immobilizable by 
adsorption on to a nitrocellulose paper. 

Aqueous solutions containing 4ug of 32 P43mer and 37 ug 16mer linker (X) were mixed and dried under 
vacuum. 6 mg of cold kinased 43mer was added and the sample was heated to 55 C C and cooled slowly to 
5 0°C to anneal. Ligation was carried out in 20 ul total reaction volume with 800 units of ligase (Pharmacia) at 
15°C for 4 hours. 1 mg (2 ul) was purified on a spun column (G-25med in TE) to a final volume of 40 ul. 
Two dilutions were spotted on nitrocellulose nylon membranes at 50 and 0.5 ng. 

Method 3: The same as method 2, but ligation was not conducted. Instead of ligation, cross linking was 
conducted with an intercaiator to keep the double stranded regions intact. Hence, the cross linked molecule 
70 will have several oligonucleotide sequences covalently linked to each other. 

2 ug of M P43mer (for sequence P-50) was added to 2.9 mg of a 16mer (for sequence P-50) linker and 
purified on a spun column .(G25med in TE) to a final volume of 40 ul. 6 mg of kinased 43mer was added 
and the samples were heated to 55°C and cooled slowly to 0°C to anneal. 25 ul of intercalation compound 
aminomethyltrioxsalen was added and the sample was irradiated for 30 minutes on ice in 500 ul total 10 
75 mM borate buffer pH 8.2 with a long wave UV lamp model (UVL-21, X = 366 nM). 

The probes modified by all three methods were then immobilized on to nitrocellulose and nylon paper 
and hybridized with labelled oligonucleotides. The results indicate that the sequence are immobilizable and 
hybridization fidelity remains intact. 

Two dilutions of the products of methods 1 to 3 were spotted on nitrocellulose and nylon membranes at 
20 50 and 0.5 ngs. 

Whole filters were baked for 30 minutes in 80°C vacuum oven and prehybridized in blotto (5% nonfat 
dry milk, 6XSSC, 20 mM Na-pyrophosphate) for 30 minutes in 50°C oven. 

Hybridization was carried out with primer extended 19'A & 19S' probes at 50 °C for one hour (3 
strips/probe). 

25 Filters were stringently washed for 15 minutes at room temperature in 6XSSC with slight agitation and 2 
x 10 minutes at 57°C. 

Air-dried filters were place on Whatman paper and autoradiographed at -70 °C overnight. 
The results presented in Fig. 1 surprisingly indicate specific hybridization are obtained by immobilizing 
oligonucleotide probes. 

30 

' Example 7: Hybridization with labeled genomic DNA for Non Radioactive Detection 

Human normal genomic (XX) DNA was photolabeled with "biotin-PEG-angelicin" (BPA) in 10 mM borate 
35 buffer pH 8.2 at a weight ratio of 0.3 to 1 (BPA:DNA) for 15 minutes on ice with a long wave UV lamp model 
UVL-21 , X = 366 nm. No purification is necessary. 

Target DNA oligonucleotides were directly immobilized on S & S nitrocellulose in 1 ul aliquots at the 
following concentration, and then baked in an, 80°C vacuum oven for 30 minutes. The amounts of the 
different immobilized probes are as follows: 
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200 


ng 
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50 
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43-mer A; 

5' CTGCT.MTCTTMGGAT : AAT«CTCCTGAGGAGAAGTCT GCT«AATCTTAA5,GGATjAAT«CT 3' 
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*-Tfor 43-mer S 
16-mer (Common to both A and S) 
3' -TT AG AATTC CTAAATT-5 ' 
5 Filters were prehybridized in blotto (5% nonfat dry milk, 6XSSC, 20 mM Na-pyrophosphate) for 30 
minutes in a 45°C H 2 0 bath. 

All 4 strips were hybridized in 2 mis solution containing 2 ug labeled XX DNA containing normal beta- 
globin gene (hybridization solution was blotto with 10% PEG) for 2 hours in 45°C in a H 2 0 bath. 
A stringency wash was carried out as follows: 
io 1 x 20' at room temperature in 6XSSC 

2 x 20' at temperatures indicated in Fig. 2 with very little agitation. 

50 ml centrifuge tubes were used for elevated temperature washes. Results are shown in Fig. 2. 
Detection of biotin in the hybrid was carried out according to the Bethesda Research Laboratory, 
Bethesda, Maryland, U.SA, manual using their kit for biotin detection. The resufts indicated specific 
75 hybridization. 



Example 8: Immobilization of Whole Genomic DNA As Probes 

20 Tens of milligram to gram amounts of DNA were prepared in the following manner from bacterial ceils 
harvested from fermentor cultures. Bacteria were collected by centrifugation from 10 liter nutrient broth 
cultures grown in a New Brunswick Scientific Microferm Fermentor. Generally, cells in concentrated 
suspension were iysed by exposure to an ionic detergent such as SDS (Na dodecyl sulfate), then nucleic 
acids were purified from proteins and lipids by extraction with phenol and/or chloroform (J. Matmur, J. MoL 

25 Biol. , 3 , 208-218, 1961). RNA was removed from the nucleic acids preparation by treatment of the DNA 
solution with 0.2 mg/ml ribonuclease at 37° C, then DNA was precipitated from solution by the addition of 
two volumes of ethanol. Bacterial DNA redissolved from the precipitate in a low salt buffer such as TE (10 
mM Tris-HCI, pH 7.5, 1mM Na 2 EDTA) was characterized with respect to purity concentration and molecular 
size, then approximately 1 microgram aliquots were denatured and immobilized as spots on nitrocellulose 

30 or nylon membranes for hybridization (Kafatos et al., Nucleic Acids. Res. 7 , 1541-1552, (1979)). Denatur- 
ation was accomplished by exposure of the DNA with approximately 0.1 N NaOH. After denaturation the 
solution was neutralized, then the membrane was rinsed in NaCl/Tris-HCI, pH 7.5, and dried. 



35 Example 9: Processing of a Test Sample for Cellular DNA Labeling 

Samples of urine, for example (although the following can equally apply to suspensions of material form 
gonorrhea-suspect swabs, from meningitis-suspect cerebrospinal fluid, from contamination-suspect water 
samples, etc.), are centrifuged or filtered to wash and concentrate any bacteria in the sample. The bacteria 

40 are then Iysed by exposure to either (i) 2 mg/ml lysozyme or lysostaphin then exposure to approximately 
90°C heat, (ii) 0.1 to 1.6 N NaOH, or (iii) 1% Na dodecyl sulfate. After (ii) NaOH, the cell lysate solution is 
neutralized before labelling; after (iii) detergent lysis, DNA labelling is preceded by removal of the SDS with 
0.5 M K acetate on ice. Angelicin should be able to permeate intact cells so that DNA labeling can be 
accomplished before ceil lysis. This in situ labeling simplifies the extraction procedure, as alkaline or 

45 detergent lysates can be incorporated directly into a hybridization solution. 

Prior to hybridization, the labeled sample is denatured, and it should also preferably be reduced to 
short single stranded lengths to facilitate specific annealing with the appropriate unlabeled probe DNA. 
Methods of denaturation are known in the art. These methods include treatment with sodium hydroxide, 
organic solvent, heating, acid treatment and combinations thereof. Fragmentation can be accomplished in a 

50 control ,1 way be heating the DNA to approximately 80°C in NaOH for a determined length of time, and 
this, of course, also denatures the DNA. 



Example 10 : Labeling of the Products of Example 9 

55 

(i) A test sample of about 10ml urine will contain 10 4 or more infectious agents. After separation by 
centrifugation and washing, the pretreated ceil lysate (step 2) was resuspended in 0.2 ml 10 mM sodium 
borate buffer (pH approximately 8). To this suspension, 10 ug of photolabeiling reagent dissolved in ethanol 
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(10 mg/ml), was added' and mixed by shaking on a vortex mixer. The mixture was then irradiated at 365 nm 
for 30 minutes with a UVGL 25 device at its long wavelength setting. The UVGL device is sold by UVP Inc., 
5100 Walnut Grove Avenue, P.O. Box 1501, San Gabriel, CA 91778, U.S.A. 

(ii) The sample was also labeled with N-(4-azido-2-nitrophenyl)-N'-(N-d-biotinyl-3-aminopropyl)-N'- 
5 methyl-1,3-propanediamine (commercially available from BRESA, G.P.O. Box 498, Adelaide, South Australia 

5001 , Australia), following the procedure described by Forster et al (1985), supra for DNA. 

(iii) When unlysed cells were used, the cell suspension in 0.2 ml 10 mM borate was incubated with 
the photoreagent for 1 hour prior to irradiation. 

70 

Example Jh: Hybridization of the Products of Examples 8 and 10 

Prior to hybridization, the membrane with spots of denatured unlabeled probe DNA was treated for up 
to 2 hours with a "prehybridization" solution to block sites in the membrane itself that could bind the 

15 hybridization probe. This and the hybridization solution, which also contained denatured labeled sample 
DNA, was comprised of approximately 0.9 M Na + , 0.1% SDS, 0,1-5% bovine serum albumin or nonfat dry 
milk, and optionally formamlde. With 50% formamide, the prehybridization and hybridization steps were 
done at approximately 42° C; without, the temperature was approximately 68° C. Prehybridized membranes 
can be stored for some time. DNA hybridization was allowed to occur for about 10 minutes or more, then 

20 unbound labeled DNA was washed from the membrane under conditions such as 0.018 M Na + (0.1 * SSC), 
0.1% SDS, 68°C, that dissociate poorly base paired hybrids. After posthybridization washes, the membrane 
was rinsed in a low salt solution without detergent in anticipation of hybridization detection procedures. 



25 Example 12: Detection of a Nucleic Acid Hybrid with Immunooold 

Affinity isolated goat antibiotin antibody (purchased from Zymed Laboratories, San Francisco, California, 
4J.S.A.) was adsorbed onto colloidal gold (20 nm) following the method described by its supplier (Janssen 
. instruction booklet, Janssen Life Sciences Products, Piscataway, New Jersey, U.S.A.) and reacted with 
30 hybridized biotinylated DNA after blocking as in a colorimetric method. The signals were silver enhanced 
using a Janssen (B2340 BEERSE, Belgium) silver enhancement kit and protocol. 



Example 13: Detection of Urinary Tract Infection in a Urine Sample 

35 

Urine samples were collected from a hospital where they were analyzed by microbiological methods 
and the results were kept secret until the hybridization diagnosis was conducted. Then they were compared 
ascertain the validity of the hybridization results. 

1 ml of clinical sample (urine) suspected of UTI infection was centrifuged in a Brinkman micro 

40 centrifuge for 5 minutes. Then 0.1 ml of 1.2 N sodium hydroxide was added and the suspension was heated 
to 100°C to lyse the cells. The suspension was then diluted to 1 ml with 10 mM sodium borate buffer pH 8 
and was neutralized with hydrochiorine acid to a pH of 7. To the solution, 50 ug "biotin-PEG-angeiicin" (see 
Example 2) is added and the mixture was irradiated with a UVL 56 long wavelength UV lamp for 15 
minutes. The irradiated sample (0.1 ml) was added to 3 ml 3XSSC of 5% nonfat dry milk 10% PEG with 0.2 

45 M sodium pyrophosphate and hybridization was conducted with probes (whole genomic DNA) immobilized 
onto nitrocellulose paper at 68° C for 5 minutes to overnight. After hybridization detection was conducted 
according to Examples 3 or 12, the spots or the photographs were visually interpreted for the presence of 
specific bacteria in the test sample, A spot of human DNA was also present in the nitrocellulose paper for 
the detection of leucocytes. The presence of leucocytes was further verified with a common method using 

50 "LEUKOSTIX" (Miles Laboratories, Elkhart, lnd s ana, U.S.A.). 

Typical results (Tables 1 and 2) indicate that the hybridization diagnosis produces similar results in a 
shorter time then the corresponding microbiological assays. The present invention not only provides 
information related to-species identification, but also the leucocyte content in a clinical sample. 

55 
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TABLE 1 

DIAGNOSIS OF CLINICAL URINE SAMPLES 

* HOSPITAL APPLICANTS' HYBRIDIZATION DETECTION 

DIAGNOSIS RESULTS SYSTEM 

NEG NEG GOLD 

NEG NEG GOLD 

NEG NEG GOLD 

NEG E.C.-M CHEMI 

NEG E.C.-VW GOLD 

NEG E.C.-VW GOLD 



s+, 


c- 


NEG 
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s+, 


C- 


E.c.-S 
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C- 
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C- 


NEG 


GOLD 


s+, 


C- 


NEG 


GOLD 


s+, 


C- 


E.C.-VW 


GOLD 


s+, 


C- 


NEG 


GOLD 


s+, 


c- 


E.C.-VW 


GOLD 


S+ r 


c- 


NEG 


GOLD 


s+, 


C- 


NEG 


GOLD 



100,000/mL E.c. 


E.c.-S 


GOLD 


100,000/mL E.c. 


E.c.-S 


CHEMI 


100,000/mL E.c. 


E.c.-W 


GOLD 


50,000/mL E.c. 


E.c.-M 


CHEMI 


50,000/mL E.c. 


NEG 


GOLD 


E. coli 


E.c.-S, Kl.-M 


CHEMI 


E. coli 


E.C.-VS, Kl.-S 


CHEMI 


E. coli 


E.c.-S, Kl.-S 


CHEMI 


E. coli/Klebsiella mix 


E.c.-S, Kl.-W 


GOLD 


E. coli/Staph mix 


E.C.-S, St.-M 


CHEMI 



22 



0 235 726 



Table 1 cont'd 



HOSPITAL 
DIAGNOSIS 



APPLICANTS' HYBRIDIZATION 
RESULTS 



DETECTION 
SYSTEM 



Klebsiella spp. 
100,000/mL K. pneumoniae 
Enter abacter spp. 
100,000 Candida 
100,000/mL Proteus 



E.C.-M, Kl.-W 
E.C.-W, Kl-VW 



CHEMI 

GOLD 

GOLD 

GOLD 

GOLD 



NEC** 
NEC** 
Pr.-S, E.C.-W 



10,000/mL Strep NEG 
Mixture of 3 unidentified Gm(+) NEG 



CHEMI 
GOLD 



* diagnosis conducted by streaking urine on an agar plate 
and treating the plate under conditions so that the 
infectious organism can grow. 

** Enterobacter/Candida probes not included in the 
hybridization assay, therefore, negative results are not 
surprising; given the high stringency conditions employed 
in the assay, cross-hybridization with species related to 
Enterobacter was not detected. 

Abbreviations: VS=very strong; S=strong; M=medium; 
W=weak; and VW=very weak hybridization signals; 
GOLD=detection method according to Example 12; 
CHEMI=chemiluminescent detection according to Example 
3(b) 

Applicants' hybridization results represent the results of a subjective interpretation of the intensity of the 
hybridization signals obtained after detection. DNAs from the organisams listed in column two are the only 
ones for which any hybridization signal was obtained. The panel of DNAs used for hybridization included EL 
coli ("E.g."), Klebsiella pneumoniae ("Kl"), Proteus vulgaris ("Pr"), Pseudomonas aeruginosa . Staphylococ- 
cus epidermatis ("SE"), Streptococcus faecalis and Homo sapiens . 
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TABLE 2 

COMPARISON OF AMES LEUKOSTIX ASSAY 
WITH APPLICANT'S ASSAY 



"LEUKOSTIX" 


APPLICANTS' HYBRIDIZATION 


DETECTION 


RESULT 


RESULT 


SYSTEM 


3+ 


VS 


GOLD 


3 + 


s 


CHEMI 


0 + 


s 


CHEMI 


3 + 


M 


CHEMI 


3+ 


M 


CHEMI 


3+ 


S 


GOLD 


3+ 


s 


GOLD 


3 + 


VS 


GOLD 


3 + 


VS 


GOLD 


3 + 


VS 


GOLD 


3+ 


VS 


GOLD 


2 + 


S 


CHEMI 


"5 l 

2+ 


s 


CHEMI 


2+ 


s 


CHEMI 


2 + 


s 


CHEMI 


^ r 

2+ 


s 


GOLD 


2 + 


s 


' GOLD 


2+ 


s 


GOLD 


2 + 


s 


GOLD 


2+ 


s 


GOLD 


1+ 


s 


GOLD 


1 + 


VS 


GOLD 


1+ 


vw 


GOLD 


TRACE/1+ 


' M 


CHEMI 


TRACE 


VS 


CHEMI 


TRACE 


w 


GOLD 


TRACE 


w 


GOLD 


TRACE 


VS 


GOLD 


NEG 


s 


CHEMI 


NEG 


s 


CHEMI 


NEG 


M 


CHEMI 


NEG 


vw 


GOLD 
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Table 2 cont'd 



LEUKOSTIX" 
RESULT 



APPLICANTS' HYBRIDIZATION 
RESULT 



DETECTION 
SYSTEM 



NEG 
NEG 
NEG 
NEG 
NEG 
NEG 



VW 
NEG 
NEG 



W 
W 
W 



GOLD 
GOLD 
GOLD 
GOLD 
GOLD 
GOLD 



The hybridization results summarized in column 2 of Table 2 represent subjective interpretations of the 
intensity of hybridization signal obtained when labeled urine samples described in Table 1 were hybridized 
with genomic human DNA. 

The "LEUKOSTIX" assay is a colorimetric reagent strip assay. Color development on the reagent strip 
is compared to a chart provided with the assay reagent strips and ranges from negative (no color 
development) to 3+ (very strong color development). 



Example 14 ; Lvsis of Cells 

A 1.0 mL aliquot of cell suspension was centrifuged and the cell pellet resuspended in 100 nL of 
unbuffered NaOH solution. The sample was then exposed to high temperature for a short time and then 
diluted to the original volume using 10 mM borate buffer. The pH of the solution was then adjusted to 
neutral with HCI. 

Table 4 shows the efficiency of lysis of two different Gram positive cocci, Staphylococcus epidermidis 
and Streptococcus faecalis . at varying NaOH concentrations at either 68°C or 100°C. In this Example, the 
absorbance of the 10 mL aliquots at 600 nm was recorded before centrifugation. After centrifugation, the. 
cell pellets were resuspended in varying concentrations of NaOH (100 uL) and duplicate samples of each 
exposed to 68°C for 10 minutes or 100°C for 5 minutes. Each sample was then diluted to 1.0 mL and the 
absorbance at 600 nm again recorded. Since the beginning and ending volumes are identical, the beginning 
and ending absorbance at 600 nm provides a direct measurement of lysis efficiency. 

Whereas Gram negative organisms lysed efficiently in as low as 0.1 N NaOH, Table 4 shows clearly 
that efficient lysis is a function of both NaOH concentration and temperature, such that higher NaOH 
concentrations are required as the incubation temperature decreases. At 100°C (maximum temperature at 1 
atmosphere) a concentration of at least 1.6 N NaOH was required for efficient lysis of S. epidermidis and 
faecalis . If lower temperatures are desirable or necessary, then higher concentrations of NaOH will be 
required to maintain lysis efficiency. 
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TABLE 3 

EFFICIENCY OF LYSIS OF GRAM POSITIVE BACTERIA 
AT VARIOUS CONCENTRATIONS OF NaOH AT 68 °C and 100 °C 

Streptococcus faecalis 



100°C/5 Minutes 68°C/10 Minutes 



[NaOH] 


OD600 PRE 


OD600 POST 


%LYSIS 


OD600 PRE 


OD600 POST %LYSIS 


0 N 


0.475 


0.366 


23 


0.512 


0.357 30 


0.1 


.509 


.261 


50 


.513 


.238 54 


0.2 


.512 


.194 


62 


.514 


.259 50 


0.4 


.504 


.175 


65 


.513 


.150 71 


0.8 


.506 


.113 


78 


.505 


.147 71 


1.2 


.498 


.082 


84 


.498 


.150 70 


1.6 


.487 


.061 


88 


.426 


.099 77 






Staphylococcus 


epidermidis 






100°C/5 Minutes 




68°C/10 Minutes . 


[NaOH] 


OD600 PRE 


OD600 POST 


%LYSIS 


OD600 PRE 


OD600 POST %LYSIS 


0 N 


0.667 


0.558 


16 


0.690 


0.560 19 


0.1 


.681 


.396 


42 


.701 


.441 37 


0.2 


.674 


.296 


60 


.699 


.414 41 


0.4 


.699 


.183 


74 


.730 


.309 58 


0.8 


.705 


.091 


87 


.715 


.187 74 


1.2 


.680 


.070 


90 


.719 


.090 88 


1.6 


.693 


.035 


95 


.660 


.040 94 



!t will be appreciated that the instant specification and claims are set forth by way of illustration and not 
limitation, and that various modifications and changes may be made without departing from the spirit and 
scope of the present invention. 



Claims 

1 . A method for detecting one or more microorganisms or polynucleotide sequences from eukaryotic 
sources in a nucleic acid-containing test sample comprising 

a) preparing a test sample comprising labeling the nucleic acids in the test sample, 

b) preparing one or more probes by immobilizing an oligonucleotide or a single-stranded nucleic acid 
of one or more known microorganisms or sequences from eucaryotic sources, 

c) contacting, under hybridization conditions, the labeled single-stranded sample nucleic acid and the 
immobilized oligonucleotide or single-stranded nucleic acid to form hybridized labeled nucleic acids, and 

d) assaying for the. hybridized nucleic acids by detecting the label. 

2. A method according to claim 1, further comprising denaturing the labeled nucleic acids to form 
labeled single stranded nucleic acids. 

3. A method according to any of claims 1 and 2 T wherein said eukaryotic sources are selected from the 
group consisting of algae, protozoa, fungi slime molds and mammalian genetic defects, such as alpha- 
thalassemia and sickle cell anemia. 

4. A method according to any of claims 1 to 3, wherein the labeling is conducted in a whole living cell 
or a cell lysate. 

5. A method according to claim 4, wherein the cell lysate is prepared by contacting a cell with alkali. 

6. A method according to any of claims 1 to 5, wherein the label is selected from the group consisting 
of protein binding ligands, haptens, antigens, fluorescent compounds, dyes, radioactive isotopes and 
enzymes. 

7. A method according to any of claims 1 to 6, wherein the immobilization is carried out by chemical 
reaction or physical adsorption. 
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8. A method according to any of claims 1 to 7, wherein the probe comprises the two or more known 
microorganisms or sequences from eukaryotic sources immobilized in the form of dots on a solid support 
strip. 

9. A method according to any of claims 1 to 8, wherein said labeling is carried out by photochemically 
5 reacting a nucleic acid binding ligand with the nucleic acid in the test sample. 

A kit for detecting one or more microorganisms or polynucleotide sequences from eukaryotic sources in 
a test sample comprising in one or more containers 

a) a solid support containing single-stranded nucleic acids of one or more known microorganisms or 
polynucleotide sequences from eukaroytic sources immobilized thereon, 
70 b) a reagent for labeling the nucleic acid of the test sample, 

c) a reagent for denaturing nucleic acid in the test sample, and 

d) hybridization reagents. 
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Brandt, Alan 

From: Brandt, Alan 

Sent: Thursday, August 24, 2006 2:48 PM 
To: 'Wirtzberger, Paul Anthony' 

Subject: RE: REMINDER - FW: REMINDER - FW: REMINDER - FW: CONFIDENTIAL: Draft US Util Pat 
App - Your ref: HP PD 200313056-1 ; Our ref: 28250.04109 

Paul, 

Thanks for your attention to this matter. 

I have a few revisions to the application due to comments from the managing HP attorney (Mr. Gehman). 

After addressing these comments. I will provide you and the other inventors with the revised draft application and 
Declaration and Assignment papers for execution. 

Regards, 
Al 



Original Message 

From: Wirtzberger, Paul Anthony [mailto:paul.wirtzberger@hp.com] 
Sent: Thursday, August 24, 2006 1:58 PM 
To: Brandt, Alan 

Subject: RE: REMINDER - FW: REMINDER - FW: REMINDER - FW: CONFIDENTIAL: Draft US Util Pat App 
- Your ref: HP PD 200313056-1 ; Our ref: 28250.04109 

I have reviewed the application and it looks good, I have no changes. 
I also checked with the other two engineers and they have no changes, 

paul 



Hello all, 

We acknowledge receipt of comments on the draft application from Mr. Gehman, managing HP 
altorney. 

However, we have not yet received comments from the inventors. 

Inventors - Please provide comments at your earliest convenience. 

Please let us know if you have any questions. 

Regards, 
Ai 

ALAN BRANDT 

CALFEE, HALTER & GRISWOLD LLP 
1400 McDonald Investment Center 
800 Superior Avenue 
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Cleveland, Ohio 441 14-2688 
Tel: 1-216-622-8658 
Fax: 1-216-241-0816 
abrandl@calfee.com 
www.calfee.com 

The preceding e-mail message (including any attachments) contains information that may be 
confidential, be protected by the attorney-client or other applicable privileges, or constitute non- 
public information. It is intended to be conveyed only to the designated recipient(s). If you are not 
an intended recipient of this message, please notify the sender by replying to this message and then 
deiete it from your system. Use, dissemination, distribution, or reproduction of this message by 
unintended recipients is not authorized and may be unlawful. 

Original Message 

From: Brandt, Alan 

Sent: Tuesday, July 25, 2006 6:50 PM 

To: 'Paul A. Wirtzberger (paul.wirtzberger@hp.com)'; "Gary Williams (gary.williams@hp.com)'; 'Rico 
Brooks (rico.brooks@hp.com)' 

Cc: 'Peggy Oyama (peggy.oyama@hp.com)'; les Gehman (les.gehman@hp.comy; Pejic, Ned 
Subject: REMINDER - FW: REMINDER - FW: CONFIDENTIAL: Draft US Util Pat App - Your ref: HP 
PD 200313056-1 ; Our ref: 28250.04109 



Original Message 

From: Brandt, Alan 

Sent: Wednesday, July 12, 2006 2:31 PM 

To: Paul A. Wirtzberger (paul.wirtzberger@hp.com); Gary Williams (gary.williams@hp.com); Rico 
Brooks (rico.brooks@hp.com) 

Cc: Peggy Oyama (peggy.oyama@hp.com); Les Gehman (les.gehman@hp.com) 

Subject: REMINDER - FW: CONFIDENTIAL: Draft US Util Pat App - Your ref: HP PD 200313056-1 ; 

Our ref: 28250.04109 

Inventors, 

Please advise when you might be able to provide comments on the attached draft patent 
application. 

Regards, 
Al 

ALAN BRANDT 

CALFEE, HALTER & GRISWOLD LLP 
1400 McDonald Investment Center 
800 Superior Avenue 
Cleveland, Ohio 441 14-2688 
Tel: 1-216-622-8658 
Fax: 1-216-241-0816 
abrandt@calfee.com 
www.calfee.com 

Original Message 

From: Brandt, Alan 

Sent: Wednesday, June 14, 2006 2:38 PM 

To: 'Paul A. Wirtzberger (paul.wirtzberger@hp.com) 1 ; 'Gary Williams (gary.williams@hp.com)'; 'Rico 
Brooks (rico.brooks@hp.com)' 

Cc: Pejic, Ned; 'Peggy Oyama (peggy.oyama@hp.com)'; 'Les Gehman (les.gehman@hp.com)' 
Subject: CONFIDENTIAL: Draft US Util Pat App - Your ref: HP PD 200313056-1 ; Our ref: 



8/25/2006 



Message 



Page 3 of 3 



28250.04109 
Paul, Gary & Rico, 

Attached is a draft of the subject U.S. utility patent application. The specification and claims are in 
the WORD document and the drawings are in the PowerPoint file. A table identifying the names 
associated with certain drawing reference numbers is also attached for your convenience to assist in 
going between the text and drawings. If applicable, please also distribute a copy of the application to 
any additional co-inventors for review. Additionally, please coordinate collective feedback on 
changes or comments from all co-inventors. 

Only individuals that have contributed to the claims can be listed as inventors on the application. 
Please call me if there needs to be a change in inventorship (either deletions or additions). 
Otherwise, if applicable, please have each inventor confirm that they have contributed to some 
portion of at least one of the claims by identifying the claim(s) and corresponding inventor(s). 

Please also carefully review the application to make sure that it describes the current best mode of 
the invention. As you may know, each inventor has an obligation to disclose to the US Patent & 
Trademark Office (USPTO) any information that may be material to the examination of the 
application including any prior art of which the inventor may be aware. Hence, please forward any 
such materials to me for filing with the USPTO. 

The HP attorney has requested an inventor-approved draft application by the end of June. 
Therefore, we request your review of the draft application at your earliest convenience so that we 
can meet this deadline. Please let us know if you will have be able to review the applications within 
this timeframe so that we can provide the HP attorney with early notice of a delay in the inventor- 
approved draft. 

If you have any questions, please do not hesitate to contact us. Thank you for your time and 
cooperation in this matter. 

Regards, 

Al 

ALAN BRANDT 

CALFEE, HALTER & GRISWOLD LLP 
1400 McDonald Investment Center 
800 Superior Avenue 
Cleveland, Ohio 44114-2688 
Tel: 1-216-622-8658 
Fax: 1-216-241-0816 
abrandt@calfee.com 
www.calfee.com 

The preceding e-mail message (including any attachments) contains information that may be 
confidential, be protected by the attorney-client or other applicable privileges, or constitute non- 
public information. It is intended to be conveyed only to the designated recipient(s). If you are not 
an intended recipient of this message, please notify the sender by replying to this message and then 
delete it from your system. Use, dissemination, distribution, or reproduction of this message by 
unintended recipients is not authorized and may be unlawful. 



8/25/2006 



Message 



Page 1 of 1 



Brandt, 



Alan 



To: 



Cc: 



From: 



Sent: 



Brandt, Alan 

Thursday, August 24, 2006 12:59 PM 
'jbertrand@invacare.com' 

Pejic, Ned; Raulerson, Billy; Hinton, Jennifer; Zitelli, William 



Subject: FW: MK6 CONTROLLER UTILITY PAT APP STATUS thru 8/24/06 
John, 

Updated status through mid-day 8/24/06. 

I understand that you were to meet with the decision maker on foreign filing this morning. Currently, you have 
instructed us to file all 8 applications (5220, 5244, 5245, 5246, 5247, 5248, 5258, 5391) in Canada, two via 
PCT (5247 & 5258). Please let us know if these foreign filing instructions change. 

Preparation of two draft utility applications continues (5244 & 5391 ). All other drafts (5220, 5245, 4246, 5247, 
5248, 5258) are complete. 

Calfee's internal review process is complete for three (5247, 5248, 5258) of four draft applications. The fourth 
application (5245) remains in the internal review process. 

One draft application is with Invacare inventors for review (5220). 

One application has been approved by Invacare for filing (5246). US Declaration/POA and Assignment papers 
have been executed for this application. 

If anyone has any updates to the status, please let me know. 

Regards, 

Al 

ALAN BRANDT 

CALFEE, HALTER & GRISWOLD LLP 
1400 McDonald Investment Center 
800 Superior Avenue 
Cleveland, Ohio 44114-2688 
Tel: 1-216-622-8658 
Fax: 1-216-241-0816 
abrandt@calfee.com 
www.calfee.com 

The preceding e-mail message (including any attachments) contains information that may be confidential, be 
protected by the attorney-client or other applicable privileges, or constitute non-public information. It is intended 
to be conveyed only to the designated recipient(s). If you are not an intended recipient of this message, please 
notify the sender by replying to this message and then delete it from your system. Use, dissemination, 
distribution, or reproduction of this message by unintended recipients is not authorized and may be unlawful. 
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IMMOBILIZED SEQUENCE-SPECIFIC PROBES 

This invention relates to nucleic acid chemistry and to methods for detecting 
particular nucleic acid sequences. More specifically, the invention relates to a method for 
immobilizing DNA and RNA probes, stable assay reagents comprising the immobilized 
probes, and hybridization assays conducted with these immobilized probes. The invention 
has applications in the fields of medical diagnostics, medical microbiology, forensic 
science, environmental monitoring of microorganisms, food and drug quality assurance, 
and molecular biology. . 

Investigational microbiological techniques have been applied to diagnostic assays. 
For example, Wilson et al., U.S. Patent No. 4,395,486 discloses a method for detecting 
sickle cell anemia by restriction fragment length polymorphism (RFLP). Wilson et al 
identified a restriction enzyme capable of cleaving a normal globin gene but incapable of 
cleaving the mutated (sickle cell) gene. As sickle cell anemia arises from a point mutation, 
the method is effective but requires 10 to 20 ml of blood or amniotic fluid. 

Various infectious diseases can be diagnosed by the presence in clinical samples of 
specific DNA sequences characteristic of the causative microorganism or infectious agent. 
Pathogenic agents include certain bacteria, such as Salmonella. Chlamydia , and Neisseria ; 
viruses, such as the hepatitis, HTLV, and HIV viruses; and protozoans, such as 
Plasmodium , responsible for malaria. U.S. Patent No. 4,358,535 issued to Falkow et al. 
describes the use of specific DNA hybridization probes for the diagnosis of infectious 
diseases. The Falkow et ah method for detecting pathogens involves spotting a sample 
(e.g., blood, cells, saliva, etc.) on a filter (e.g., nitrocellulose), lysing the cells, and fixing 
the DNA through chemical denaturation and heating. Then, labeled DNA probes are added 
and allowed to hybridize with the fixed sample DNA, and hybridization indicates the 
presence of the pathogen DNA. A problem inherent in the Falkow gt al. procedure is 
insensitivity; the procedure does not work well when very few pathogenic organisms are 
present in a clinical sample from an infected patient or when the DNA to be detected 
constitutes only a very small fraction of the total DNA in the sample. Falkow et al. do 
teach that the sample DNA may be amplified by culturing the cells or organisms in place on 
the filter. 

Routine clinical use of DNA probes for the diagnosis of infectious diseases would 
be simplified considerably if non-radioactively labeled probes could be employed as 
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described in EP 63,879 to Ward. In the Ward procedure, horseradish peroxidase (HRP) 
labeled DNA probes are detected by a chromogenic reaction similar to ELISA. The Ward 
detection methods and reagents are convenient but relatively insensitive, again because the 
specific sequence that must be detected is usually present in extremely small quantities. 
A significant improvement in DNA amplification, the polymerase chain reaction 
5 (PCR) technique, was disclosed by Mullis in U.S. Patent No. 4,683,202, and detection 

methods utilizing PCR are disclosed by Mullis et al. in U.S. Patent No. 4,683,195. In the 
PCR technique, short oligonucleotide primers are prepared which match opposite ends of a 
sequence to be amplified. The sequence between the primers need not be known. A 
sample of DNA or RNA is extracted and denatured, preferably by heat. Then, 
L o oligonucleotide primers are added in molar excess, along with dNTPs and a polymerase, 

preferably Taq polymerase, which is stable to heat and commercially available from Perkin- 
Elmer/Cetus Instruments. DNA polymerase is "primer-directed," in that replication initiates 
at the two primer annealing sites. The DNA is replicated, and then again denatured. 

This replication results in two "long products," which begin with the respective 
L5 primers, and the two original strands (per duplex DNA molecule). The products are called 
"long products," only because there is no defined point of termination of the synthesized 
strand. The reaction mixture is then returned to polymerizing conditions (e.g., by lowering 
the temperature, inactivating a denaturing agent, and, if necessary, adding more 
polymerase), and a second cycle initiated. The second cycle provides the two original 
20 strands, the two long products from cycle one, two new long products (replicated from the 
original strands), and two "short products" replicated from the long products produced in 
cycle one. The products are called "short products," because these strands must terminate 
at the 5' end of the "long product" template -- the end defined by the primer that initiated 
synthesis of the long product. The short products contain the sequence of the target 
25 sequence (sense or antisense) with a primer at one end and a sequence complementary to a 

primer at the other end. On each additional cycle, an additional two long products are 
produced, and a number of short products, equal to the number of long and short products 
remaining at the end of the previous cycle, are also produced. Thus, the number of short 
products can double with each cycle. This exponential amplification of a specific target 
30 sequence allows the detection of extremely small quantities of DNA. 

The PCR process has revolutionized and revitalized the nucleic acid based medical 
diagnostics industry. Because the present invention provides reagents that will often be 
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utilized in conjunction with PCR, some additional background information on PCR may be 
helpful. The PCR process can be used to amplify any nucleic acid, including single or 
double-stranded DNA or RNA (such as messenger RNA), nucleic acids produced from a 
previous amplification reaction, DNA-RNA hybrids, or a mixture of any of these nucleic 
acids. If the original or target nucleic acid containing the sequence variation to be amplified 
is single stranded, its complement is synthesized by adding one or more primers, 
nucleotides, and a polymerase; for RNA, this polymerase is reverse transcriptase. 

The PCR process is useful not only for producing large amounts of one specific 
nucleic acid sequence, but also for amplifying simultaneously more than one different 
specific nucleic acid sequence located on the same or different nucleic acid molecules. 
When one desires to produce more than one specific nucleic acid sequence in PCR, the 
appropriate number of different oligonucleotide primers are utilized. For example, if two 
different specific nucleic acid sequences are to be produced, four primers can be utilized: 
two for each specific nucleic acid sequence to be amplified. 

The specific nucleic acid sequence amplified by PCR can be only a fraction of a 
larger molecule or can be present initially as a discrete molecule, so that the specific 
sequence amplified constitutes the entire nucleic acid. In addition, the sequence amplified 
by PCR can be present initially in an impure form or can be a minor fraction of a complex 
mixture, such as a portion of nucleic acid sequence due to a particular microorganism that 
constitutes only a very minor fraction of a particular biological sample. The nucleic acid or 
acids to be amplified may be obtained from plasmids such as pBR322, from cloned DNA 
or RNA, or from natural DNA or RNA from sources such as bacteria, yeast, viruses, and 
higher organisms such as plants or animals. DNA or RNA may be extracted from blood or 
tissue material such as chorionic villi or amniotic cells by a variety of techniques, including 
the well known technique of proteolysis and phenol extraction, as is common for 
preparation of nucleic acid for restriction enzyme digestion. In addition, suitable nucleic 
acid preparation techniques are described in Maniatis g£ al., Molecular Cloning: A 
T .ahoratorv Manual (New York, Cold Spring Harbor Laboratory, 1982), pp. 280-281; 
U.S. Patent Nos. 4,683,195 and 4,683,202; EP 258,017; and Saiki gt al., 1985, 
Biotechnology 3:1008-1012. 

Any specific nucleic acid sequence can be produced by the PCR process. It is only 
necessary that a sufficient number of bases at both ends of the sequence be known in 
sufficient detail so that two oligonucleotide primers can be prepared which will hybridize to 
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different strands of the desired sequence at relative positions along the sequence such that 
an extension product synthesized from one primer, when it is separated from its template 
(complement), can serve as a template for extension of the other primer. The greater the 
knowledge about the bases at both ends of the sequence, the greater can be the specificity 
of the primers for the target nucleic acid sequence, and thus the greater the probability that 
5 the process will specifically amplify the target 

The specific amplified nucleic- acid sequence produced by PCR is produced from a 
nucleic acid containing that sequence and called a template or "target." If the target nucleic 
acid contains two strands, the strands are separated before they are used as templates, either 
in a separate step or simultaneously with the synthesis of the primer extension products. 

10 This strand separation can be accomplished by any suitable denaturing method, including 

physical, chemical, or enzymatic means. One physical method of separating the nucleic 
acid strands involves heating the nucleic acid until it is completely (>99%) denatured. 
Typical heat denaturation involves temperatures ranging from about 80 to 105°C for times 
ranging from about 1 second to 10 minutes. Strand separation may also be induced by a 

15 helicase enzyme, or an enzyme capable of exhibiting helicase activity, e.g., the enzyme 

RecA, which has helicase activity and in the presence of riboATP is known to denature 
DNA. The reaction conditions suitable for separating the strands of nucleic acids with 
helicases are described in Cold Spring Harbor Symposia on Quantitative Biology, Vol. 
XLIII "DNA: Replication and Recombination" (New York, Cold Spring Harbor 

20 Laboratory, 1978), B. Kuhn et al, "DNA Helicases", pp. 63-67, and techniques for using 

RecA are reviewed in Radding, 1982, Ann . Rev . Genetics 16:405-437. 

When the complementary strands of the nucleic acid or acids are separated, whether 
the nucleic acid was originally double or single stranded, the strands are ready to be used as 
a template for the synthesis of additional nucleic acid strands. The amplification reaction is 

25 generally conducted in a buffered aqueous solution, preferably at a pH of 7 to 9 (all pH 

values herein are at room temperature) most preferably about pH 8. Preferably, a molar 
excess (for cloned nucleic acid, usually about 1000:1 primentemplate, and for genomic 
nucleic acid, usually about 106-8:1 primentemplate) of the two oligonucleotide primers is 
added to the buffer containing the separated template strands. The amount of 

30 complementary strand may not be known, however, in many applications, so that the 

amount of primer relative to the amount of complementary strand may not be determinable 
with certainty. 
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The deoxyribonucleoside triphosphates dATP, dCTP, dGTP, and dTTP are also 
added to the PCR mixture in adequate amounts, and the resulting solution is heated to about 
90-100°C for about 1 to 10 minutes, preferably from 1 to 4 minutes. If the target nucleic 
acid forms secondary structure, the nucleotide 7-deaza-2'-deoxyguanosine-5'-triphosphate 
is also employed, as is known in the an, to avoid the potential problems such secondary 
5 structure can cause. After heating, the solution is allowed to cool to room temperature, 

preferred for primer hybridization. To the cooled mixture is added a polymerization agent, 
and the polymerization reaction is conducted under conditions known in the art. This 
synthesis reaction may occur at temperatures primarily defined by the polymerization agent. 
Thus, for example, if an E. coli DNA polymerase is used as a polymerizing agent, the 

1 0 maximum temperature for polymerization is generally no greater than about 40°C. Most 
conveniently, the reaction using E. col; polymerase occurs at room temperature. For most 
PCR applications, however, the thermostable enzyme Taq polymerase is employed at much 
higher temperatures, typically 50 to 70°C. 

Nevertheless, the polymerization agent for PCR may be any compound or system, 

1 5 including enzymes, which will function to accomplish the synthesis of primer extension 

products from nucleotide triphosphates. Suitable enzymes for this purpose include, for 
example, E. coli DNA polymerase I, Klenow fragment of E. sqHDNA polymerase I, T4 
DNA polymerase, other available DNA polymerases, reverse transcriptase (used in the first 
cycle of PCR if the target is RNA), and other enzymes, including heat-stable enzymes such 

20 as Taq polymerase, which will facilitate combination of the nucleotides in the proper 

manner to form the primer extension products which are complementary to a target nucleic 
acid strand. Generally, synthesis will be initiated at the 3' end of each primer and proceed 
in the 5' direction along the template strand, until synthesis terminates. There may be 
agents, however, which initiate synthesis at the 5' end and proceed in the other direction, 

25 and there seems no reason such agents could not also be used as polymerization agents in 

PCR. 

The newly synthesized strand in PCR is base paired to a complementary nucleic 
acid strand to form a double-stranded molecule, which in turn is used in the succeeding 
steps of the PCR process. In the next step, the strands of the double-stranded molecule are 
30 separated to provide single-stranded molecules on which new nucleic acid is synthesized. 

Additional polymerization agent, nucleotides, and primers may be added if necessary for 
the reaction to proceed. The PCR steps of strand separation and extension product 
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synthesis can be repeated as often as needed to produce the desired quantity of the specific 
nucleic acid sequence. 

As noted above, PCR has revolutionized the nucleic acid based diagnostics 
industry. European Patent Office Publication 237,362, incorporated herein by reference, 
discloses assay methods employing PCR. In EP 237,362, PCR-amplified DNA is fixed to 
5 a filter and then treated with a prehybridization solution containing SDS, Ficoll, serum 

albumin, and various salts. A specific oligonucleotide probe (of e.g., 16 to 19 nucleotides) 
is then added and allowed to hybridize. Preferably, the probe is labeled to allow for 
detection of hybridized probes. EP 237,362 also describes a "reverse" dot blot, in which 
the probe, instead of the amplified DNA, is fixed to the membrane. 

1 o The recent advent of PCR technology has enabled the detection of specific DNA 

sequences initially present in only minute (<1 ng) quantities- For example, Higuchi et al., 
1988, Nature 132:543-546, describe the characterization of genetic variation between 
individuals based on samples containing only a single hair. DNA was isolated from the 
hair by digestion and extraction and then treated under PCR conditions to obtain 

15 amplification. Spetific nucleotide variations were then detected by either fragment length 

polymorphism (PCR-FLP), hybridization to sequence-specific oligonucleotide (SSO) 
probes (a technique also described in Saiki e£ al., 1986, Nature 324: 163-166) or by direct 
sequencing via the dideoxy method (using amplified DNA rather than cloned DNA). 
Because PCR results in the replication of a DNA sequence positioned between two 

20 primers, insertions and deletions between the primer sequences result in product sequences 
of different lengths, which can be detected by sizing the product in PCR-FLP. In SSO 
hybridization, the amplified DNA can be fixed to a nylon filter by UV irradiation in a series 
of "dot blots" and, in one variation of the technique, then allowed to hybridize with an 
oligonucleotide probe labeled with HRP under stringent conditions. After excess probe is 

25 removed by washing, 3, 3\ 5, S'-tetramethylbenzidine (TMB) and H2O2 are added: HRP 
catalyzes H2O2 oxidation of TMB to a blue precipitate, the presence of which indicates 
hybridized probe. U.S. Patent No. 4,789,630, incorporated herein by reference, describes 
protocols and TMB compounds useful for purposes of the present invention. One may 
alternatively use one of the other leuco dyes (such as a red leuco dye developed by DuPont 

30 and licensed to Kodak) to indicate the presence of HRP. However, any chromogen that 

develops precipitable color or fluorescence as a consequence of peroxidatic activity can be 
used to detect HRP-labeled reagents. In fact, any enzyme can be used to label, so long as 



there exists a colorless substrate which forms a colored or fluorescent product as a result of 
enzyme activity and the product can be captured on a solid support Separate dot blot 
hybridizations are performed for each allele tested 

Church £1 aJ„ 1984, Proc. Natl. Acad . M* USA .81:1991-1995, discloses a 
method for genomic sequencing which comprises CTOSS-Iinking restriction enzyme-digested 
genomic DNA fragments to nylon membranes using UV irradiation, and probing the bound 
fragments with comparatively long (100-200 bp) DNA probes. Church £t al. also discloses 
that NTPs dried onto nylon membranes and UV irradiated at 0.16 KJ/m2 for two minutes 
are bound more stably (i.e., TTP = 130x, dGTP = 30x, dCTP = 20x, and dATP = lOx) 
than non-UV irradiated nucleotides. Primary amino groups are highly reactive with 254 
nm light-activated thymine (see Saito et al, 1981. Tetrahedron Lett. 22:3265-68), and this 
reactivity is believed to be the mechanism by which nucleotides become covalently bound 
to a membrane. 

The detection of genetic variations using SSO probes is typically performed by first 
denaturing and immobilizing the sample DNA on a nylon or nitrocellulose membrane. The 
membrane is then treated with short (15-20 base) oligonucleotides under stringent 
hybridization conditions, allowing annealing only in cases of exact complementarity. A 
large number of hybridizations must be performed when a sample is examined for the 
presence of many different sequences. For example, a test for the most common genetic 
mutations that lead to beta-thalassemia in Mediterranean populations would involve 12 
probes and require 12 separate hybridizations, accomplished either by probing one filter 12 
times or by conducting simultaneous hybridizations on 12 replicate filters (or some 
combination thereof). A DNA^based HLA typing test can require 20 to 50 probes and 
hybridizations, a prohibitive effort if one uses the prior art methods that require either 
splitting the sample into as many portions as there are probes or blotting the sample 
followed by probing with a single probe and then removing the probe, a process that must 
be repeated for each probe tested. 

In traditional nucleic acid detection by oligomer hybridization, the DNA in the test 
sample, including the hybridization target, is noncovalently chemisorbed onto a solid 
support such as nitrocellulose or nylon and then hybridized to a labeled target-specific 
probe which, except in the SSO methodology just described, usually contains hundreds to 
thousands of nucleotides and is made biosynthetically. This method suffers from multiple 
deficiencies. The noncovalent target capture generally is weak enough that considerable 
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target may be washed from the solid support during detection (see, for example, Gingeras 
e£al., 1987. Nucleic Acids Research 15:5373-5390 and Gamper gt aL, 1986. Nucleic 
Acids Research 14:9943-9954). Target chemisorption reduces the reactivity of the target 
sequence toward hybridization with probe. The capture and hybridization processes 
normally take many hours to reach completion. The need to chemisorb the target 
5 immediately before detection prevents the manufacture of a storage-stable capture reagent 

with built-in target specificity that can be applied rapidly to test samples. The sequence non- 
specificity of capture complicates the examination of a single test sample with more than 
one probe: either a lot of test sample must be available to load different solid supports to 
incubate with the various probes or a lot of time must be consumed in serial probing of a 

10 singly immobilized test sample. 

Ranki et al., 1983, Gene 21:77-85, improve on the traditional technology by 
creating a sequence-specific capture reagent, capturing the target sequence from the test 
sample by nucleic acid hybridization and increasing specificity by detecting captured target 
with a second, labeled, sequence-specific nucleic acid probe. However, their technology 

15 still suffers from multiple deficits. The sequence-specific capture probe is immobilized by 

chemisorption, so that the assay still is vulnerable to signal attenuation by desorption of 
both capture probe and probe- target complex during the incubations and washes. 
Chemisorption reduces probe reactivity, requiring long incubation times to maximize 
capture efficiency. Two nucleic acid probes must be manufactured instead of one. The 

20 capture and detection probes of Ranki £t al. are so large that they must be prepared by 

biosynthetic instead of much cheaper chemical synthetic routes, A small capture probe 
would not be immobilized efficiently by chemisorption, 

Gingeras eral., 1987, supra , improve further on DNA probe technology by 
covalently attaching relatively short, chemically synthesized, oligonucleotide hybridization 

25 probes to a solid support, dramatically reducing hybridization time. However, direct 

coupling of the target-specific sequence to the support risks reduced reactivity caused by 
steric occlusion by the support. Furthermore, the method demonstrated no way of 
detecting captured DNA apart from the incorporation of radioactive label into the target, a 
procedure which is relatively hazardous and inconvenient Finally, the beaded solid 

30 support of Gingeras et al. is hard to adapt to assays in which multiple targets are probed, 

because the test sample must be exposed to separate containers of beads carrying the 
different probes, taking care not to mix beads with different specifications. 
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Gamper et al., supra, describe a different strategy to accelerate oligomer 
hybridization: oligo hybridization is performed in solution rather than on a solid support, 
the hybrid species being simultaneously photochemically trapped, because the target- 
specific oligomer has been chemically modified with a moiety which crosslinks double- 
stranded DNA when irradiated. However, apart from the expense of creating the photo- 
5 adduct labeling reagent, this method suffers the inconvenience and delay associated with 
ultrafiltration to remove the considerable excess of unreacted probe, followed by gel 
electrophoresis to purify the hybridization product to the point that it can be identified. This 
procedure would be parriculary inconvenient to adapt to simultaneous probing of multiple 
targets, because of the need to engineer targets to be electrophoretically resolvable. 
10 The present invention provides a particularly advantageous assay method which 

permits the simultaneous nonisotopic detection of two or more specific nucleic acid 
sequences or control conditions in a single test sample, using a single solid support divided 
into discrete regions to which different oligonucleotide probes have been covalendy 
attached via spacer arms. The method comprises: 
15 (a) attaching the probes to defined regions of the solid support through 

spacer arms, attached at one end to the probe and at the other end to 
the support; 

(b) reacting the test sample with the probe-bearing solid support under 
conditions promoting hybridization of the probes to any single- 

20 stranded complementary nucleic acid sequences in the test sample; 

(c) washing away any nucleic acid not hybridized to probe; and 

(d) detecting the probe-captured nucleic acid, preferably 
nonisotopically. 

Because of the permanence of covalent attachment, the preparation of immobilized probes 
25 can be separated in time from their use, permitting manufacture of a storage-stable detection 
reagent, the hybridization capture support, which can be used rapidly to detect target 
nucleic acid sequences in test samples on demand. Covalent probe attachment and the use • 
of a spacer arm between support and probe greatly accelerate and improve the efficiency of 
hybridization. The use of a dimensionally stable solid support with discrete regions for 
30 different probes greatly improves the economics, simplifies the physical format, and 
increases the reliability of hybridization and detection, because all target sequences in a 
simple test sample and all control conditions can be probed simultaneously in a single short 
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incubation and because all probe-target hybrids are exposed to identical incubation, wash, 
and detection conditions. Nonisotopic detection, whether via colored or fluorescent labels 
directly attached to the target nucleic acid or via colored, fluorescent, or enzyme labels 
indirectly attached to the target nucleic acid through a specific binding reaction, is much 
safer and more convenient than detection of radioactive atoms attached to the target nucleic 
5 acid, especially when developing storage-stable detection reagents and assay kits. 

The invention also relates to a novel, stable, assay reagent comprising 
oligonucleotide probes covalently attached to discrete regions of a solid support via spacer 
arms, which probes have sequences designed to hybridize to different analyte nucleic acids 
in the test sample or to indicate different (positive or negative) control conditions which test 

10 the validity of the assay conditions. This assay reagent will have significant commercial 

impact, being ideally suited to large-scale, automated, manufacturing processes and having 
a long shelf life. The reagent will prove especially useful in situations where the number of 
target sequences exceeds the number of samples tested. In general, the greater the number 
of target sequences and therefore immobilized probes, the greater the improvement of the 

15 invention over what has gone before. With PCR-amplified DNA samples, a simple test can 
easily be assayed for over fifty specific sequences on a single solid support The 
nonisotopic detection aspect of the invention is especially well suited to target sequences 
generated by PCR, a process which permits covalent attachment to all target molecules of 
colored or fluorescent dyes and of binding moieties like biotin, of colored or fluorescent 

20 and of binding moieties like biotin, digoxin, and specific nucleic acid sequences. The 
methods and reagents of the invention are also suited to detection of isotopically labeled 
nucleic acid, although this mode is not preferred. 

An important aspect of the invention relates to a specific chemistry for attaching 
oligonucleotide probes to a solid support in a way which is especially suitable to large-scale 

25 manufacture and which permits maximization of probe retention and hybridization 

efficiency. This chemistry comprises covalent attachment of a polynucleotide (preferably 
poly-dT) tail to the probe and fixation by the ultraviolet irradiation of the photoreactive 
tailed probe to a solid support bearing primary or secondary amines (e.g., a nylon 
membrane). However, the invention provides numerous alternative, non-photochemical 

30 ways to attach probe to support, wherein electrophilic reagents are used to couple the probe 
to the spacer and the spacer to the solid support in either reaction order, and wherein the 
spacer can be any of a large variety of organic polymers or long-chain compounds. 
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Another aspect of the invention relates to a DNA sequence detection kit, which kit 
comprises the stable assay reagent, essentially a solid support having oligonucleotide 
probes covalently bound thereto via a spacer arm. The kit can also include PCR reagents, 
including PCR primers selected for amplification of DNA sequences capable of hybridizing 
with the oligonucleotide probes. 

To aid in understanding and describing the invention, the following terms are 

defined below: 

" Allele-specific oligonucleotide" (ASO) refers to a probe that can be used to 
distinguish a given allelic variant from all other allelic variants of a particular allele by 
hybridization under sequence-specific hybridization conditions. 

"DNA polymorphism" refers to the condition in which two or more different 
variations of a nucleotide sequence exist in the same interbreeding population. 

"Genetic disease" refers to specific deletions and/or mutations in the genomic DNA 
of an organism that are associated with a disease state and include sickle cell anemia, cystic 
fibrosis, alpha-thalassemia, beta-thalassemia, and the like. 

"Label" refers to any atom or molecule which can be used to provide a detectable 
(preferably quantifiable) signal and which can be attached to a nucleic acid or protein. 
Labels may provide signals detectable by fluorescence, radioactivity, colorimetry, X-ray 
diffraction or absorption, magnetism, enzymatic activity, and the like. Suitable labels 
include fluorophores, chromophores, radioactive atoms (particularly 32p and 1251), electron- 
dense reagents, enzymes, and ligands having specific binding partners. Enzymes are 
typically detected by their activity. For example, HRP can be detected by its ability to 
convert diaminobenzidine (more preferably, however, TMB is used) to a blue pigment, 
quantifiable with a spectrophotometer. It should be understood that the above description 
is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, 1251 may serve as a radioactive label or as 
an electron-dense reagent. HRP may serve as enzyme or as antigen for an antibody, such 
as a monoclonal antibody (MAb). Further, one may combine various labels for desired 
effect. For example, MAbs and avidin can be labeled and used in the practice of this 
invention. One might label a probe with biotin, and detect its presence with avidin labeled 
with 1251 or with an antibiotin MAb labeled with HRP. Alternatively, one may employ a 
labeled MAb to dsDNA (or hybridized RNA) and thus directly detect the presence of 
hybridization without labeling the nucleic acids. Other permutations and possibilities will 
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be readily apparent to those of ordinary skill in the art and are considered within the scope 
of the instant invention. 

"Oligonucleotide" refers to primers, probes, oligomer fragments, oligomer 
controls, and unlabeled blocking oligomers and is a molecule comprised of at least two or 
more deoxyribonucleotides or ribonucleotides. An oligonucleotide can also contain 

5 nucleotide analogues, such as phosphorothioates and alkyl phosphonates, and derivatized 

(i.e., labeled) nucleotides. The exact size of an oligonucleotide will depend on many 
factors, which in turn depend on the ultimate function or use of the oligonucleotide. 

"Primer" refers to an oligonucleotide, whether occurring naturally or produced 
synthetically, which is capable of acting as a point of initiation of synthesis when placed 

10 under conditions in which synthesis of a primer extension product complementary to a 

nucleic acid strand can occur. The primer is preferably an oligodeoxyribonucleotide and is 
single stranded for maximum efficiency in amplification, but may alternatively be double 
stranded. If double stranded, the primer is first treated to separate its strands before being 
used to prepare extension products. The primer must be sufficiently long to prime the 

15 synthesis of extension products in the presence of the polymerase, but the exact length of a 
primer will depend on many factors. For example, for diagnostics applications, the 
oligonucleotide primer typically contains 15 to 25 nucleotides. Short primer molecules 
generally require cooler temperatures to form sufficiently stable hybrid complexes with 
template. Suitable primers for amplification are prepared by means known to those of 

20 ordinary skill in the art, for example by cloning and restriction of appropriate sequences, 

direct chemical synthesis, and purchase from a commercial supplier. Chemical methods for 
primer synthesis include: the phosphotriester method described in Narang et al, 1979, 
Meth. En2vmol . 68:90 and U.S. Patent No. 4,356,270; the phosphodiester method 
disclosed in Brown et al M 1979, Meth . Enzvmol , 68:109; the diethylphosphoramidite 

25 method disclosed in Beaucage et al., 1981, Tetrahedron , Lett , 22:1859-1862; and the solid 
support method disclosed in U.S. Patent No. 4,458,066. The primers may also be 
labeled, if desired. 

"Restriction fragment length polymorphism" (RFLP) refers to a DNA 
polymorphism at a restriction enzyme recognition site. The restriction enzyme specific for 

30 the polymorphic site can be used to digest sample DNA, and when the digested DNA is 

fractionated by electrophoresis and, if necessary, treated for visualization, different samples 
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produce different restriction endonuclease patterns, depending on the particular 
polymorphic sequence present in the sample, 

"Sequence-specific hybridization" refers to strict hybridization conditions in which 
exact complementarity between probe and sample target sequence is required for 
hybridization to occur. Such conditions are readily discernible by those of ordinary skill in 
the art and depend upon the length and base composition of the probe. In general, one may 
vary the temperature, pH, ionic strength, and concentration of chaotropic agent(s) in the 
hybridization solution to obtain conditions under which substantially no probes will 
hybridize in the absence of an "exact match." For hybridization of probes to bound DNA, 
the empirical formula for estimating optimum temperature under standard conditions (0.9 M 
NaCl) is; T(°C) — 4 (Nq + N c ) + 2(N A + N T ) - 5°C, where N G . N c , N T » and N A are the 
numbers of G, C, A, and T bases in the probe (J. Meinkoth et a]., 1984, Analyt . 
Biochem . 138:267-284). Those of skill in the art recognize, however, that this calculation 
only gives an approximate value for optimum temperature, which should then be 
empirically tested to obtain the true optimum temperature. The probe utilized in a sequence- 
specific hybridization is called a "sequence-specific oligonucleotide 11 (SSO), which can also 
be an allele-specific oligonucleotide (ASO). Those of skill in the art recognize that for a 
single mismatch between probe and target to be destabilizing, the hybridizing region of the 
probe must be relatively short, generally no longer than about 23 bases, and usually about 
17 to 23 bases in length. 

"Specific binding partner" refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and an antibody or MAb 
specific therefor. Other specific binding partners include biotin and avidin, streptavidin, or 
an anti-biotin antibody; IgG and protein A; and the numerous receptor-ligand couples 
known in the art. 

To aid in understanding the invention, several Figures accompany the description of 
the invention. These Figures are briefly described below. 

Figure 1 depicts a plot of probe binding as a function of UV exposure. 

Figure 2 depicts a plot of hybridization efficiency as a function of UV exposure. 

Figure 3 depicts a series of dot blots demonstrating the presence of either normal 
beta-globin or sickle cell beta-globin, obtained by sandwich assay. 

Figure 4 depicts a series of dot blots demonstrating the presence of either normal 
beta-globin or sickle cell beta-globin, obtained by direct assay. 
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Figure 5 depicts a series of dot blots demonstrating HLA DQalpha genotyping. 
Figure 6 depicts a series of dot blots demonstrating beta-thalassemia typing. 
The present invention provides a method for detecting the presence of a specific 
nucleotide sequence in a sample by contacting the sample with immobilized oligonucleotide 
probes under conditions that allow for hybridization of complementary nucleic acid 
5 sequences. In one embodiment of the method, the test sample is contacted with a solid 

support upon which are immobilized probes specific for one or more target sequences (the 
"analyte"), probes for a positive control sequence that should be present in all test samples, 
and optionally probes for a negative control sequence that should not be present in any test 
sample — each different probe is immobilized at a distinct region on the solid support. If an 

10 analytical signal is detected in the negative control region or if no analytical signal is 

detected in the positive control region, then the validity of the response in the analyte region 
is suspect, and the sample should be retested. In another preferred embodiment, a number 
of different analyte-specific probes are covalently attached to distinct regions of a solid 
support so that one can test for allelic variants of a given genetic locus in a single 

15 hybridization reaction. In still another preferred embodiment, the various analyte-specific 
probes are complementary to nucleotide sequences present in various microorganisms. 

The immobilized probes are also an important aspect of the invention. The probes 
comprise two parts: a hybridizing region composed of a nucleotide sequence of about 10 to 
50 nucleotides (nt) and a spacer arm, at least as long as the hybridizing region, which is 

20 covalently attached to the solid support and which acts as a "spacer", allowing the 

hybridizing region of the probe to move away from the solid support, thereby improving 
the hybridization efficiency of the probe. In a preferred embodiment, the spacer arm is a 
sequence of nucleotides, called the "tail", that serves to anchor the probes to the solid 
support via covalent bonds between nucleotides within the tail of the probe and reactive 

25 groups within the solid support matrix. 

As described more fully below, the immobilized probes of the invention avoid the 
problems inherent in prior art detection methods with immobilized probes. These problems 
include a lack of sensitivity, for many prior art methods for immobilizing probes actually 
result in the hybridizing region of the probe becoming attached to the solid support and thus 

30 less free to hybridize to complementary sequences in the sample. In addition, the prior an 
methods for synthesizing and attaching the spacer to the probe and to the solid support are 
complicated, time-consuming, expensive, and often involve the use of toxic reagents. In 
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marked contrast, a preferred method of the invention for synthesizing and immobilizing 
probes is quickly completed with readily available, relatively nontoxic, reagents, and with 
practically no chemical manipulations. The polynucleotide tails of the invention are 
composed of nucleotides that are attached to the hybridizing region with an enzyme or with 
the aid of commercially available nucleic acid synthesizers. The tails of the invention are 
attached to the solid support by a similarly problem-free method: exposure to ultraviolet 
(UV) light. 

Those of skill in the art recognize that nucleic acid hybridization serves as the basis 
for a number of important techniques in the medical diagnostics and forensics industries. 
In addition, nucleic acid hybridization serves as an important tool in the laboratories where 
scientific advances in many diverse fields occur. The present invention represents an 
important step in making nucleic acid based diagnostics even more powerful and useful. 
As noted above, PCR has played an important role in these same industries and 
laboratories, and the present invention will often be practiced on samples in which the 
nucleic acid has been amplified by PCR. Various useful embodiments of the present 
invention are described below, but the full scope of the invention can only be realized when 
understood and utilized by the various and diverse practitioners of nucleic acid based 
diagnostics. 

One very important use of the present invention relates to the detection and 
characterization of specific nucleic acid sequences associated with infectious diseases, 
genetic disorders, and cellular disorders, including cancer. In these embodiments of the 
invention, amplification of the target sequence is again useful, especially when the amount 
of nucleic acid available for analysis is very small, as, for example, in the prenatal 
diagnosis of sickle cell anemia using DNA obtained from fetal cells. Amplification is 
particularly useful if such an analysis is to be done on a small sample using non-radioactive 
detection techniques which may be inherently insensitive, or where radioactive techniques 
are being employed but where rapid detection is desirable. 

The immobilized probes provided by the present invention not only are useful for 
detecting infectious diseases and pathological abnormalities but also are useful in detecting 
DNA polymorphisms not necessarily associated with a pathological state. The term 
forensic is most often used in a context pertaining to legal argument or debate. Individual 
identification on the basis of DNA type is playing an ever more important role in the law. 
For example, DNA typing can be used in the identification of biological fathers and so 
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serves as an important tool for paternity testing. DNA typing can also be used to match 
biological evidence left at the scene of a crime with biological samples obtained from an 
individual suspected of committing the crime. In a similar fashion, DNA typing can be 
used to identify biological remains, whether those remains are a result of a crime or some 
non-criminal activity. The practice of forensic medicine now routinely involves the use of 
5 DNA probes, in protocols that can be made more efficient by employing the present 
invention. 

To achieve the important and diverse benefits of the present invention, one must 
first synthesize the probes to be immobilized on a solid support. The probe sequence can 
be synthesized in the same manner as any oligonucleotide, and a variety of suitable 

10 synthetic methods were noted above in the discussion of PCR primers. The hybridizing 

region of the probes of the invention is typically about 10 to 50 nt in length, and more often 
17 to 23 nt in length, but the exact length of the hybridizing region will of course depend 
on the purpose for which the probe is used. Often, for reasons apparent to those of skill in 
the art, the hybridizing region of the probe will be designed to possess exact 

15 complementarity with the target sequence to be detected, but once again, the degree of 

complementarity between probe and target is somewhat tangential to the present invention. 
The probes of the invention are, however, preferably "tailed" with an oligonucleotide 
sequence that plays a critical role in obtaining the benefits provided by the present 
invention. 

20 This tail of the probes of the invention consists of ribonucleotides or 

deoxyribonucleotides (e.g., dT, dC, dG, and dA). The nucleotides of the tail can be 
attached to the hybridizing region of the probe with terminal deoxynucleotidyl transferase 
(TdT) by standard methods. In addition, the entire tailed probe can be synthesized by 
chemical methods, most conveniently by using a commercially available nucleic acid 

25 synthesizer. One can also synthesize the tails and hybridizing regions separately and then 
combine the two components. For instance, a preparation of tails can be prepared (and 
even attached to a solid support, such as a bead) and then attached to a preparation of 
hybridizing regions. 

When using a DNA synthesizer to make the tailed probes of the invention, one 

30 should take steps to avoid making a significant percentage of molecules that, due to a 
premature chain termination event, do not contain a hybridizing region. One such step 
involves synthesizing the hybridizing region of the probe first, creating a tailed probe with 
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the hybridizing region at the 3' end of the molecule. Because the likelihood of a premature 
chain termination event increases with the length of the molecule, this step increases the 
likelihood that if a chain termination event occurs, the occurrence merely results in a shorter 
tail. However, because most premature chain termination events are a result of a failure to 
"de-block" during synthesis, and because the exact number of residues in the tail of a probe 
5 of the invention is not critical, one may also merely omit the blocking and de-blocking steps 
during automated synthesis of the tail region of the probe. If these steps are omitted 
(during tail synthesis only), the tail can be placed on either the 5' or 3' end of the probe 
with equal efficiency and satisfactory results. 

As noted above, the preferred spacer arms of the probes of the invention are 

1 o comprised of nucleotide tails. Because the tails serve to attach the probe to the solid 

support, the relative efficiency with which a given oligonucleotide will react with a solid 
support is important in choosing the sequence to serve as the tail in the probes of the 
invention. Most often, the tail will be a homopolymer, and Figure 1, below, depicts the 
relative efficiencies with which synthetic oligonucleotides with varying length 

15 homopolymer tails were covalently bound to a nylon filter as a function of UV exposure. 

As shown in the figure, oligonucleotides with longer poly-dT tails were more readily fixed 
to the membrane, and all poly-dT tailed, oligonucleotides attained their maximum values by 
240 mJ/cm* of irradiation at 254 nm. In contrast, the poly-dC (400 nt in length) tail 
required more irradiation to crosslink to the membrane and was not comparable to the 

20 equivalent poly-dT tail even after 600 ml/cm* exposure. Untailed oligonucleotides were 
retained by the filter in a manner roughly parallel to that of the poly-dC-tailed probes. 

Thus, the probes of the invention preferably comprise a poly-dT tail of greater than 
10 thymidine (T) residues. Usually, the tail will comprise at least 100 T residues, and most 
preferably, the tail will comprise at least 400 dT nucleotides. As the poly-dT tail functions 

25 primarily to bind the probe to the solid support, the exact number of dT nucleotides is not 
critical, as noted above. Although those of skill in the art will readily recognize the fact, it 
should be noted, that the composition of the tail need not be homogeneous, i.e., a mixture 
of nucleotides may be used. Preferably, however, the tail will include a significant number 
of thymine bases, as T reacts most readily with the solid support by the preferred methods 

30 for making the immobilized probes of the invention, which methods are discussed more 
fully below. If one desires to utilize the probes in sequence-specific hybridizations, one 
must be aware of the problem caused by creation of a random sequence in the 
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heterogeneous tail that closely resembles the hybridizing region of a probe. If a 
heterogeneous tail is employed, it is still desirable to maintain a distribution of 150 
pyrimidine residues per tail, if the probes are to be fixed to a solid support by UV 
irradiation. 

The tail should always be larger than the hybridizing region, and the greater the 
5 disparity in size between the hybridizing region and the tail (so long as the tail is larger), the 
more likely it is that the tail, rather than the hybridizing region, will react with the support, 
a preferred condition. Larger tails thus increase the likelihood that only the tail will 
participate in reacting with and thereby binding to the solid support Because the tail also 
functions as a "spacer," enabling the complementary sequence to diffuse away from the 

10 solid support where it may hybridize more easily, free of steric interactions, larger tails are 
doubly preferred. Excessively extended tails, however, are uneconomical, and, if carried 
to an extreme, excessive tailing could have adverse effects, 

• A preferred method of synthesizing a probe of the invention is as follows. The 
probe is synthesized on a DNA synthesizer (the Model 8700, marketed by Biosearch, is 

15 suitable for this purpose) with beta-cyanoethyl N, N-diisopropyl phosphoramidite 
nucleosides (available from American Bionetics) using the protocols provided by the 
manufacturer. If desired, however, only the hybridizing region of the probe is synthesized 
on the instrument, and then 200 pmol of the probe are tailed in 100 jil of reaction buffer at 
pH = 7.6 and containing 100 mM cacodylate, 25 mM Tris-base, 1 mM C0CI2, and 0.2 mM 

20 dithiothreitol with 5 to 160 nmol deoxyribonucleotide triphosphate (dTTP) and 60 units (50 
pmol) of terminal deoxyribonucleotidyl transferase (available from Ratliff Biochemicals) 
for 60 minutes at 37 degrees C (see Roydhoudhury et al„ 1980, Meth . Enz . £5:43-62, for 
buffer preparation). Reactions are conveniently stopped by the addition of 100 jil of 10 mM 
EDTA. The lengths of tails can be controlled by limiting the amount of dTTP (or other 

25 nucleotide) present in the reaction mixture. For example, a nominal tail length of 400 dT 
residues is obtained by using 80 nmol of dTTP in the protocol described above* 

Once the tailed probe of the invention is synthesized, the probe is then attached to a 
solid support. Suitable solid supports for purposes of the present invention will contain (or 
can be treated to contain) free reactive primary or secondary amino groups capable of 

30 binding a UV-activated pyrimidine, especially thymine. Secondary amino groups may be 
preferred for purposes of the present invention. There arc many ways to assure that a 
solid support (not necessarily nylon) has free, particularly secondary, amino groups. 
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Amine-bearing solid supports suitable for purposes of the present invention include 
polyethylenimine (chemisorbed to any solid, such as cellulose or silica with or without 
glutaraldehyde crosslinking) and silica or alumina or glass silanized with amine-bearing 
reagents such as PCR Inc.'s ProsilTM 220, 221, 3128, and 3202 reagents, Manville sells 
controlled porosity glass papers (BiomatTM) appropriate for aminoalkyl silanization. One 
may alkylate immobilized primary amines (e.g. with a methyl halide or with formaldehyde 
plus cyanoborohydride, (as described by Jentoff £t si, 1979, 1 Biol. Chem . 254:4359- 
4365). As noted above, one may use a solid support to which polyethylenimine has been 
chemisorbed. Polyvinyl chloride sheets containing PEI-loaded silica are commercially 
available (manufactured by Amerace and sold by ICN as ProtransTM and by Polysciences 
as Poly/SepTM), and PEI loading of cellulose is well known. 

The solid support, also called a substrate, can be provided in a variety of forms, 
including membranes, rods, tubes, wells, dipsticks, beads, ELISA-format plates, and the 
like. A preferred support material is nylon, which contains reactive primary amino groups 
and will react with pyrimidines irradiated with UV light. Preferred solid supports include 
charge modified nylons, such as the Genetrans-45™ membrane marketed by Plasco and 
the ZetaProbe™ membrane marketed by Bio-Rad. 

Having chosen a suitable solid support, one makes the preferred immobilized 
probes of the invention by reacting a tailed oligonucleotide probe with the solid support 
under conditions that favor covalent attachment of the tail to the solid support. In a 
preferred embodiment, the solid support is a membrane, and probe binding results from 
exposure of the probe on the membrane to UV irradiation, which activates the nucleotides 
in the tail, and the activated nucleotides react with free amino groups within the membrane. 
Careful dessication of tailed probes of the invention spotted onto a suitable solid support 
can also be used to facilitate covalent attachment of the probes to the substrate. One can 
assay for the presence or absence in a solid support matrix of reactive groups capable of 
reacting with oligonucleotides by the procedure described in Example 1. 

As is apparent from the foregoing, a preferred method for preparing the 
immobilized probes of the invention comprises fixing an oligonucleotide probe with a poly- 
dT tail to a nylon membrane by UV irradiation. Although poly dT tails react very 
efficiently to solid supports by the methods of the present invention, efficiency of reaction 
of oligonucleotides with a membrane does not necessarily correlate with hybridization 
efficiency. One may therefore wish to determine the hybridization efficiency of a given 
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oligonucleotide probe after immobilization on a solid support, - When the hybridization of 
various tailed probes is measured as a function of UV dosage, as shown in Figure 2, one 
observes that the optimum exposure changes with length of a poly-dT tail. Optimal 
exposures are about 20 mJ/cntf for 800 nt poly-dT tails and about 40 mJ/cm* for 400 nt 
poly-dT tails. 

5 At 60 mJ/cm2 exposure, one observes that oligonucleotides with longer tails 

hybridize more efficiently than can be accounted for by the additional amounts of probe 
reacted with and bound to the filter. This increased efficiency is believed to be due to a 
spacing effect: increasing the distance between the membrane and the hybridizing region of 
the immobilized probe may increase hybridization efficiency of the probe. Thus, too much 

10 UV exposure during immobilization can not only damage the nucleotides in the probe but 

also can reduce the average spacer length and decrease hybridization efficiency. It is 
important to note that because dC tails react less efficiently (as compared to dT tails) with a 
membrane, hybridization efficiency of a poly-dC tailed probe reaches a plateau where loss 
due to UV damage and tail shortening is compensated for by the fixing of new molecules 

15 to the membrane (see Figures 1 and 2). This characteristic of poly-dC tails may make such 
tails preferred when UV exposure cannot be carefully controlled. 

No matter what the base content of the tail of the probe, one may automate the 
attachment of probe to support in accordance with the method of the present invention. 
One semi-automated means of attachment preferred for positive charge nylon membranes is 

20 as follows. A commercially available "dot-blot" apparatus can be readily modified to fit 
into a Perkin-Elmer/Cetus Pro/Pette® automated pipetting station; the membrane is then 
placed on top of the dot-blot apparatus and vacuum applied. The membrane dimples under 
the vacuum so that a small volume (5 to 20 \xl) of probe applied forms a consistent dot with 
edges defined by the diameter of the dimple. No disassembly of the apparatus is required 

25 to place and replace the membrane the vacuum may be kept constant while membranes 
are applied, spotted with probe, and removed. 

Once the probes are spotted onto the membrane, the spotted membrane is treated to 
immobilize the probes. A preferred method for covalently attaching the probes to a nylon 
membrane is as follows. Tailed oligonucleotides in TE buffer (10 raM Tris-HCl, pH = 

30 8.0, and 0. 1 mM EDTA) are applied to a Genetrans-45™ (Plasco) membrane with a 

BioDot™ (Bio-Rad) spotting manifold. The damp membranes, also called "filters," are 
then placed on paper pads soaked with TE buffer, the pads and filters are then placed in a 
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UV light box (the Stratalinker 1800™ light box marketed by Stratagene is suitable for this 
purpose) and irradiated at 254 nm under controlled exposure levels. UV dosage can be 
controlled by time of exposure to a particular UV light source or, more preferably, by 
measuring the radiant UV energy with a metering unit. Exposure time typically ranges 
from about 0.1 to 10 minutes, most often about 2 to 3 minutes. The support is preferably 
damp during irradiation, but if the support is dried first, a shorter UV irradiation exposure 
can be used. The irradiated filters are washed in a large volume of a solution composed of 
5X SSPE (IX SSPE is 1 80 mM NaCl, 10 mM NaH 2 P0 4 , and 1 raM EDTA, pH = 7.2) 
and 0.5% sodium dodecyl sulfate (SDS) for about one-half hour at 55 degrees C to remove 
unreacted oligonucleotides. Filters can then be rinsed in water, air dried, and stored at 
room temperature until needed. 

UV irradiation of nucleotides is known to cause pyrimidine photochemical 
dimerization, which, for purposes of the present invention, is not preferred. A number of 
steps can be taken to reduce dimerization during UV mediation, including: applying the 
oligonucleotide probe to the membrane at a high pH, above 9 and preferably above 10; 
applying the probe to the membrane at a very low ionic strength, between 0 and 0.01; using 
the lowest probe concentration that gives the desired signal intensity; and irradiating with 
light excluding wavelengths longer than 250 nm, preferably with no light with a 
wavelength longer than 240 nm. However, the extent to which these steps could impair 
probe immobiliziation has not been tested. In general, spotting and attachment of the probe 
to the membrane should be done at a temperature and in a solvent that minimizes base- 
pairing and base-stacking in the probes. 

After constructing the novel immobilized probes of the invention, one is ready to 
employ those probes in the useful nucleic acid sequence detection methods of the invention. 
In a prefeued embodiment of this method, a sample suspected of containing a target nucleic 
acid sequence is treated under conditions suitable for amplifying the target sequence by 
PCR. Note that the process of "asymmetric" PCR, described by Gyllensten and Erlich, 
1988, Proc. Natl . Acad . Sci . USA 85:7652-7656, for generation of single stranded DNA 
can also be used to amplify the sample nucleic acid. The PCR primers are biotinylated, for 
subsequent detection of hybridized primer-containing sequences. The amplification 
reaction mixture is denatured, unless asymmetric PCR was used to amplify, and then 
applied to a membrane of the present invention under conditions suitable for hybridization 
to occur (most often, sequence-specific hybridization). Hybridized probe is detected by 
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binding streptavidin-horseradish peroxidase (SA-HRP, available from a wide variety of 
chemical vendors) to the biotinylated DNA, followed by a simple colorimetric reaction in 
which a substrate such as TMB is employed. One can then determine whether a certain 
sequence is present in the sample merely by looking for the appearance of colored dots on 
the membrane. 

In an especially preferred embodiment of the above method, a filter with 
immobilized oligonucleotides is placed in hybridization solution containing 5X SSPE, 
0.5% SDS, and 100 ng/ml SA-HRP (as marketed under the SeeQuenceTM by Cetus and 
Eastman Kodak). PCR-amplified sample DNA is denatured by heat or by addition of 
NaOH and EDTA and added immediately to the hybridization solution, which contains 
enough SSPE to neutralize any NaOH present The sample is then incubated at a suitable 
temperature for hybridization to occur (typically, as exemplified below, at 55 degrees C for 
30 minutes). During this incubation, hybridization of product to immobilized 
oligonucleotide occurs as well as binding of SA-HRP to biotinylated product. The filters 
are briefly rinsed in 2X SSPE and 0.1 % SDS at room temperature, then washed in the 
same solution at 55-degrees C for 10 minutes, then quickly rinsed twice in 2X PBS (IX 
PBS is 137 mM NaCl, 2.7 mM KC1, 1.5 mM KH 2 P04, and 8 raM Na 2 HP0 4 , pH = 7.4) 
. at room temperature. Color development is performed by incubating the filters in red leuco 
dye or TMB at room temperature for 5 to 10 minutes. Photographs are taken of the filters 
after color development for permanent records. 

Although the detection method described above is preferred, those of skill in the art 
recognize that the immobilized probes of the invention can be utilized in a variety of 
detection formats. One such format involves labeling the immobilized probe itself, instead 
of the sample nucleic acid. If the probe is labeled at or near the end of the hybridizing 
sequence (far from the site of attachment of the probe to the solid support), one can treat the 
potentially hybridized sample DNA with an appropriate restriction enzyme, i.e., one that 
cleaves only duplex nucleic acids at a sequence present in the hybridizing region of the 
probe, so that restriction releases the label from the probe (and the membrane) for detection 
of hybridization. Suitable labels include peroxidase enzymes, acid phosphatase, 
radioactive atoms or molecules (e.g., 32P, 1251, etc.), fluorophores, dyes, biotin, ligands 
for which specific monclonal antibodies are available, and the like. If the primer or one or 
more of the dNTPs utilized in a PCR amplification has been labeled (for instance, the 
biotinylated dUTP derivatives described by Lo gt ah, 1988, Nuc. Acids Res. 16:8719), 



23 



10 



15 



20 



25 



30 



instead of the immobilized probe, then, as noted above, hybridization can be detected by 
assay for presence of label bound to the membrane. 

The immobilized probes of the invention can also be used in detection formats in 
which neither probe nor primer is labeled. In such a format, hybridization can be detected 
using a labeled "second probe." The second probe is complementary to a sequence 
occurring within the target DNA, but not overlapping the bound probe sequence; after 
hybridization of the second probe, the immobilized probe, second probe, and target 
sequence form a nucleic acid "sandwich," the presence of which is indicated by the 
presence of the label of the second probe on the membrane. One could also employ 
monoclonal antibodies (or other DNA binding proteins) capable of binding specifically to 
duplex nucleic acids (e.g., dsDNA) in a detection format that uses no labeled nucleic acids. 
Those of skill in the an will recognize that one important advantage of the immobilized 
probes of the invention is the ability, at least with most detection formats, to recycle the 
support-bound probe by denaturing the hybridized complex, eluting the sample DNA, and 
treating the support (for example, by washing, bleaching, etc.) to remove any remaining 
traces of extraneous DNA, label, developer solution, and immobilized dye (see U.S. Patent 
No. 4,789,630 and PCT application No. 88/0287, incorporated herein by reference). 

Those of skill in the art will recognize the many and diverse uses for the 
immobilized probes of the present invention. One exciting application of these immobilized 
probes is in conjunction with the technique of simultaneous amplification of several DNA 
sequences ("multiplex" PCR). Such simultaneous amplification can be used to type at 
many different loci with a single membrane. For instance, one can type for the 
polymorphic Hjndlll site in the Ggamma gene (see Jeffreys, 1979, £§11 18:1-10), the 
polymorphic Avail site in the low density lipoprotein receptor gene (see Hobbs et al., 
1987, Nuc. Acids Res . 15:379), and for polymorphisms in the HLA DQalpha gene 
simultaneously by amplifying all three loci in a single PCR and applying the amplified 
material to a suitable set of immobilized probes of the present invention. Other genetic 
targets whose analysis would be simplified by this technique include the detection of 
somatic mutations in the ras genes, where six loci and 66 possible alleles occur (see 
Verlaan-de Vries et al, 1986, Gene. 50:313-320); the typing of DNA polymorphisms at the 
HLA DP locus; the detection of beta-thalassemia in Middle Eastern populations, where in 
addition to the endogenous mutations, Mediterranean and Asian Indian mutations are 
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present at significant frequencies; the detection of infectious pathogens; and the detection of 
microorganisms in environmental surveys. 

In many of these applications, it will be desirable to obtain a membrane on which 
are immobilized a diverse set of oligonucleotides specific for different sequences that : 
nevertheless can hybridize under the same sequence-specific hybridization conditions. If 

5 necessary, this situation can be achieved by adjusting the length, position, and strand- 
specificity of the probes, or by varying the amount of probe applied to the membrane, or by 
adding a salt, such as tetramethylammonium chloride, to the hybridization buffer to 
minimize differences among immobilized oligonucleotides caused by varying base 
compositions (see Wood et ah, 1985, Proa Natl- Acad. £ci. USA 32:1585-1588). 

10 The examples below illustrate various useful embodiments of the invention and 

enable the skilled artisan to appreciate the invention more fully and so should not be 
construed as limiting the scope of the invention in any way. 

Example 1 

15 Probe Retention and Hybridization Efficiency 

The stable binding of poly-dT-tailed probe sequences to nylon as a function of tail 
length and UV exposure was examined as described below. A 19-base oligonucleotide 
(RSI 8: S'-CTCCTGAGGAGAAGTCTGC) was labeled at its 5' end with gamma 32p- 
ATP and T4 polynucleotide kinase (see Saiki £t al, 1986, Nature 324 :163-1661 Portions 

20 of the kinased probe were then tailed with dTTP and terminal transferase (TdT), as 

described by Roydhoudhury et al. RS 1 8 was present at a concentration of 2 jiM, TdT 
(Ratliff Biochemicals, Los Alamos, NM) at 600 U/ml, and either dCTP or dTTP at either 0 
pM, 50 |iM, 100 ^lM, 200 p.M, 400 fiM, or 800 JiM to prepare constructs with either dC 
or dT tails of approximately 0, 25, 50, 100, 200, 400 or 800 dT bases or 400 dC bases per 

25 molecule. Reaction mixtures were incubated for 60 minutes at 37°C and were terminated 
by addition of an equal volume of 10 mM EDTA. 

Four pmol of each sample diluted in 100 |il of TE buffer were spotted onto nine 
duplicate filters (Genetrans-45 nylon, Plasco, Woburn, Mass.), UV irradiated for various 
times, washed to remove unbound oligonucleotides, and then each spot was measured by 

30 scintillation counting to determine the amount of probe crosslinked to the nylon membrane. 
The values plotted in Figure 1 are relative to an unirradiated, unwashed control filter (100% 
retention). UV irradiation was accomplished by placing the filters in a Stratalinker 1800™ 
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UV light box and irradiating the filters at 254 mn. Dosage was controlled using the internal 
metering unit of the device. The filters were then washed in 5X SSPE, 0.5% SDS for 30 
minutes at 55°C to remove DNA not stably bound. The results plotted in Figure 1 show 
that even the non-tailed probe was retained by the membrane, but that retention of the 
untailed probe was not gready improved by UV irradiation. The 400-dT tailed probe 
exhibited >90% retention after suitable exposure. 

However, high retention does not necessarily correlate with high hybridization 
efficiency. Thus, hybridization efficiency was measured as follows. Probes were 
prepared with poly-dT tails as above, but with unlabeled RSI 8. The probes were spotted 
onto filters and UV irradiated, and excess probe was washed from the membrane by 
incubating the membrane in 5X SSPE, 0.5% SDS, for 30 minutes at 55°C. The 
membranes were then hybridized with 5 pmol of complementary 32P-kinase labeled 40-mer 
(RS24: 5'-CCCACAGGGCAGTAACGGCAGACTTCTCCTCAGGAGTCAG, specific 
activity of 1.5 uCi/pmol) in a solution (10 ml) containing 5X SSPE and 0.5% SDS, at 55°C 
for 20 minutes, which are sequence- specific hybridization conditions. The membrane was 
then washed first with 2X SSPE, 0.1% SDS (3 x 100 ml) for 2 minutes at about 25°C, 
then in 2X SSPE, 0.1% SDS at 55°C for 5 minutes. The individual spots were excised, 
counted, and the counts plotted against UV exposure, as shown in Figure 2. The values 
plotted are fmol RS24 hybridized to the membrane. The results show that none of the non- 
tailed probe was able to hybridize under the conditions used, even though as much as 50% 
of the applied RSI 8 should be bound to the membrane. All of the tailed probes were able 
to hybridize, with hybridization efficiency increasing with increasing tail length. Optimal 
UV exposures were from about 60 to 120 mJ/cm2. 

Example 2 
Sandwich Assay for Sickle-Cell Anemia 
Two allele-specific probes were prepared, one for the normal beta-globin allele, 
called RSI 8, and one for the sickle-cell allele, RS21 (5'CTCCTGTGGAGAAGTCTGC); 
each probe had a 400 nt poly-dT tail. If desired, a probe for the hemoglobin C allele can be 
prepared with the sequence: 5'CTCCTAACKjAGAAGTCTGC. Eight replicate filters 
were prepared and spotted with 4, 2, 1, and 0.5 pmol of each tailed probe using the method 
set forth in Example 1, and then UV irradiated by placing the filters, DNA-side down, 
directly onto a TM-36 Transilluminator UV light box (U.V. Products, San Gabriel, CA) 
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for 5 minutes. Four 1 jig samples of genomic DNA (from cell lines Molt4 (betaAbetaA), 
SC-1 (betaSbetaS), M + S (betaAbetaS), and GM2064 (beta^beta^), a beta-globin deletion 
mutant) were subjected to 30 PCR amplification cycles with the primer pair PC03 
(5 ' ACAC AACTGTGTTCACTAG C) and KM38 (5TGGTCTCCTTAAACCTGTCTTG). 
PCR was carried out in substantial accordance with the procedure described by 

5 SaiMeiaJ., 1988. Science 239 :1350-1354. The DNA samples were amplified in 100^1 of 
reaction buffer containing 50 mM KC1, 10 mM Tris-HCl (pH = 8.4), 1.5 mM MgCl 2 , 100 
jig/ml gelatin, 200 JAM each of dATP, dCTP, dGTP, and dTTP, 0.2 |iM of each primer, 
and 2.5 units of Taq DNA polymerase (Perkin-Elmer/Cetus Instruments). The cychng 
reaction was performed on a programmable heat block, the DNA Thermal Cycler, available 

10 from PECI, set to heat at 95 degrees C for 1 5 seconds (denature), cool at 55 degrees C for 
15 seconds (anneal), and incubate at 72 degrees C for 30 seconds (extend) using the Step- 
Cycle program. After 30 cycles, the samples were incubated an additional 5 minutes at 72 
degrees C 

Each amplification product (18 fil) was denatured by heating at 95°C for 5 minutes 

15 in 1 ml of TE, and then quenched on ice. A solution (4 ml) of 6.25X SSPE, 6.25X 

Denhardt's, and 0.625% SDS was mixed with 1 ml of each denatured PCR product, 
hybridized to one of the filters for 15 minutes at 55°C, washed with 2X SSPE, 0,1% SDS 
(3 X 100 mi) for 2 minutes at about 25°C, and then washed with 2X SSPE, 0.1% SDS (1 
x 100 ml) for 5 minutes at 55°C. 

20 The membranes were then equilibrated in 2X SSPE, 0.1% (v/v) Triton X-100 (100 

ml) for 3 minutes at about 25°C to remove SDSo All of the filters were then hybridized in 
the same buffer with a horseradish peroxidase (HRP) labeled 15-mer, RSI 11 (5 1 - 
GCAGGTTGGTATCAA), specific for the PC03/KM38 amplification product, prepared by 
the method disclosed in PCT publications WO 89/02931 and 89/02932, incorporated herein 

25 by reference. 

These methods essentially involve derivatizing the nucleic acid probe using a linear 
linking molecule comprising a hydrophilic polymer chain (e.g., polyoxyethylene) having a 
phosphoramidite moiety at one end and a protected sulfhydryl moiety at the other end. The 
phosphoramidite moiety couples to the nucleic acid probe by reactions well known in the 

30 art (e.g., Beaucage et al., 198 1 , Tetrahedron Lett . 22: 1 859- 1 862), while the deprotected 
sulfhydryl group can form disulfide or other covalent bonds with a protein, e.g., HRP. 
The HRP is conjugated to the linking molecule through an N-maleimido-6-aminocaproyl 
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group. The label is prepared by esterifying N-maleimido-6-aminocaproic acid with sodium 
4-hydroxy-3-nitrobenzene sulfonate in the presence of one equivalent of 
dicylcohexylcarbodiimide in dimethylformamide. After purification, the product is added 
to phosphate buffer containing HRP at a weight ratio of 8:1 HRP to ester. The 
oligonucleotide probe is synthesized in a DNA synthesizer, and the linking molecule having 
the structure (C6H5)3CS-(CH 2 CH 2 0)4-P(CH 2 CH 2 CN) [Ntf-Pr^ is attached using 
phosphoramidite synthesis conditions. The trityl group is removed, and the HRP 
derivative and probe derivative are mixed together and allowed to react to form the labeled 
probe. A biotin-labeled probe may be prepared by similar methods. 

The membrane was incubated with 4 pmol RS 1 1 1 in a solution (8 ml) composed of 
5X SSPE, 5 x Denhardfs, and 0.5% Triton X-100 for 10 minutes at 40°C. Following 
incubation, the membrane was washed with 2X SSPE, 0.1% Triton X-100 (3 x 100 ml) 

for 2 minutes at about 25°C. 

The reaction was followed by color development (Sheldon gt al, 1986, Proc. Natl- 
Acad . Sri. USA 81:9085-9089) with TMB/H 2 0 2 , as shown in Figure 3. The membranes 
were soaked in 100 ml of color development buffer B (CDB-B: 237 mM NaCl, 2.7 mM 
KC1, 1.5 mM KH 2 P0 4 , 8.0 mM Na 2 HP0 4 , pH 7.4, 5% (v/v) Triton X-100, 1 M urea, 
and 1% dextran sulfate), followed by washes with 2 x 100 ml of CDB-C (100 mM sodium 
citrate, pH 5.0) for 2 minutes at room temperature. Color was developed by replacing the 
CDB-C solution with 100 ml of CDB-D (100 mM sodium citrate, pH 5.0, 0.1 mg/ml 
3,3\5,5'-tetramethylbenzidine), adding 50 ^1 of 3% H 2 0 2 , and allowing the color to 
develop for 30 minutes. The beta-globin genotypes of the amplified DNA samples were 
readily apparent from the filters, and good signal intensity was obtained even from the 0.5 
pmol spot. 

Example 3 
rwrt Assay for Sickle-Cell Anemia 
A second set of DNAs (as described in Example 2 above) was amplified with PC03 
and BW19. BW19 has the sequence S'CAACTTCATCCACGTTCACC, and is covalently 
bound to a molecule of biotin at the 5' end. Twelve [L\ of each of these amplification 
products were denatured as described in Example 2, added to 4 ml of hybridization buffer 
(6.25X SSPE, 6.25X Denhardt's, and 0.625% SDS), and incubated with the membrane- 
bound probe (the remaining four filters from Example 2 above) at 55°C for 15 minutes. 
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The membranes were then washed with 2X SSPE, 0.1% SDS (3 x 100 ml) for 3 minutes 
at room temperature, followed by a wash with 2X SSPE, 0.1% SDS (1 x 100 ml) for 5 
minutes at 55°C 

The membranes were pooled together and equilibrated in 100 ml of CDB-A for 5 : 
minutes at about 25°C (CDB-A: 237 mM NaCl, 2.7 mM KC1, 1.5 mM KH 2 P0 4 , 8.0 mM 
5 Na2HP0 4l pH 7.4, and 5% Triton X-100). The membranes were then placed in a heat- 
sealable bag with 10 ml of CDB-A and See-Quence™ SA-HRP conjugate (Cetus 
Corporation, Emeryville, CA) at 0.3 jig/ml and gently shaken for 10 minutes at about 
25°C Excess conjugate was removed by washing with CDB-A (3 x 100 ml for 3 minutes 
at 25°C), CDB-B ( 1 x 100 ml for 5 minutes at 25°C), and CDB-C (2 x 100 ml for 3 

10 minutes at 25°C). The membranes were then equilibrated in CDB-D (100 ml) for 5 minutes 
at room temperature, followed by addition of 50 \iL of 3% H2O2. The color was allowed to 
develop for 30 minutes with gentle shaking, followed by washing under deionized water 
for 10 minutes. The four filters after color-development are shown in Figure 4. The 
intensity and specificity of signals detected by this method compare favorably to those 

15 obtained by sandwich hybridization. The faint signal in GM2064 was due to beta-globin 
contamination in that DNA sample. 

Example 4 
HLA DOalpha Genotvping 
20 The DQalpha test is derived from a PCR-based oligonucleotide typing system that 

.partitions the polymorphic variants at the DQalpha locus into four DNA major types 
denoted DQA1, DQA2, DQA3, and DQA4, three DQA4 subtypes, DQA4.1, DQA4.2, and 
DQA4.3, and three DQA1 subtypes, DQA1.1, DQAL2, and DQA1.3 (see Higuchi et al, 
1988, Nature 332 :543-546 and Saiki et ah, 1986, Nature 324:163-166). 
25 Four oligonucleotides specific for the major types, four oligonucleotides that 

characterized the subtypes, and one control oligonucleotide that hybridizes to all allelic 
DQalpha sequences were given 400 nt poly-dT tails and spotted onto 12 duplicate nylon 
filters. About 2 to 10 pmol of each probe were placed in each spot 

With regard to amount of probe spotted, however, one may wish to employ lower 
30 amounts of RH54 and GH64, i.e., Q.Q35 pmol RH54 per spot is preferred. These probes 

are positive control probes for amplication and will hybridize to any DQalpha alleles under 
the conditions described. By reducing the amount of positive control probe on the 



29 



membrane, one can make the positive control probe the least sensitive probe on the 
membrane. Then, if insufficient amplified DQalpha DNA is applied to the membrane, one 
can recognize the problem, for the positive control probe will not react or will react only 
very weakly. Otherwise, when insufficient sample DNA is applied to the membrane, one 
runs the risk of misreading a heterozygous type as a homozygous type, because some 
probes hybridize less efficiently than others. 

After spotting, the membranes were irradiated at 40 mJ/cntf. The sequences of the 
hybridizing regions of the resulting immobilized probes are shown below. 



DOATvne 


Designation 


Al 


GH75 




RH83 


A2 


RH71 




RH82 


A3 


GH67 


A4 


GH66 


Al.l 


GH88 


A1.2, 1.3, 4 


GH89 


A1.3 


GH77 


not A 1.3 


GH76 


all 


RH54 




GH64 


A4.2, A4.3, 




not A4.1 


HE01 



Sequence 

5'-CTCAGGCCACCGCCAGGCA or 

5'-GAGTTCAGCAAATTTGGAG 

5-TTCCACAGA CTTAG ATTTG or 

5'-TTCCACAGACTTAGATTTGAC 

5'-TTCCGCAGATTTAGAAGAT 

5'-TGTTTGCCTGTTCTCAGAC 

S'-CGTAGAACTCCTCATCTCC 

5'-GATGAGCAGTTCTACGTGG 

5'-CTGGAGAAGAAGGAGAC 

5'-GTCTCCTTCCTCTCCAG 

5'-CTACGTGGACCTGGAGAG- 

GAAGGAGACTGCCTG or 
S'-TGGACCTGGAGAGGAAGGAGACTG 

5-CATCGCTGTGACAAAACAT 



Although most of the probes are uniquely specific for one DQA type, two of the DQA1 
subtyping probes cross hybridize to several DNA types. GH89 hybridizes to a sequence 
common to the DQA 1.2, 1.3, and. 4 types, and the probe GH76 detects all DQA types 
except DQA1.3. The GH76 probe is needed to distinguish DQA 1.2/1 .3 heterozygotes 
from DQA1. 3/1.3 homozygotes. Further, the length and strand specificity of the probes 
were adjusted so that their relative hybridization efficiencies and stringency requirements 
for allelic discrimination were approximately the same. These eight probes produce a 
unique hybridization pattern for each of the 21 possible DQA diploid combinations. 
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The sequence variation that defines the DQalpha DNA types is localized within a 
relatively small hypervariable region of the second exon that can be encompassed within a 
single 242 bp PCR amplification product (see Horn el si., 1988, Pro?. Matl- Acad. £ei. 
USA 85:6012-6016). These primers are shown below. 

Primer RSI 34 is 5'-GTGCTGCAGGTGTAAACTTGTACCAG 
Primer RSI 35 is 5-CACGGATCCGGTAGCAGCGGTAGAGTTG 
Biotinylated PCR primers were used to amplify this 242 bp DQalpha sequence from several 
genomic DNA samples: six homozygous cell lines and six heterozygous individuals. 

The biotinylated primers were synthesized as follows. Primary amino groups were 
introduced at the 5' termini of the primers by a variation of the protocols set forth in Coull 
et al., 1986. Tetrahedron Lett . 27:3991-3994 and Connolly, 1987, Nye. Acids Res. 
15:3131-3139. Briefly, tetraethylene glycol was convened to the mono-phthalimido 
derivative by reaction with phthalimide in the presence of triphenylphosphine and 
diisopropyl azodicarboxylate (see Mitsunobu, 1981, Synthesis , pp. 1-28). The mono- 
phthalimide was converted to the corresponding beta-cyanoethyl diisopropylamino 
phosphoramidite as described in Sinha et al., 1984, Nuc. Acids Res. 12:4539. The 
resulting phthalimido amidite was added to the 5' ends of the oligonucleotides during the 
final cycle of automated DNA synthesis using standard coupling conditions. During 
normal deprotection of the DNA (concentrated aqueous ammonia for five hours at 55 
degrees C), the phthalimido group was converted to a primary amine which was 
subsequently acylated with an appropriate biotin active ester. LC-NHS-biotin (Pierce) was 
selected for its water solubility and lack of steric hindrance. The biotinylation was 
performed on crude, deprotected oligonucleotide and the mixture purified by a combination 
of gel filtration and reversed-phase HPLC (see Levenson si al., 1989, in PCR Protocols 
and Applications - A Laboratory Manual , eds. Innis et al.. Academic Press, NY). 

After hybridization of the amplified DNA to the membranes and color development, 
the DQalpha genotypes of these samples is readily apparent, as is shown in Figure 5. In 
Figure 5, the specificity of each immobilized probe is noted at the top of the filters and the 
DQA genotype of each sample is noted at the right of the corresponding filter. 

The immobilized probes of the invention have so facilitated the method of DNA 
typing at the HLA DQalpha locus that kits for typing will be important commercially. These 
kits can come in a variety of forms, but a preferred embodiment of the kit is 
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described in detail, below. This description is followed by a description of simplified 
typing protocols for use with the kit 

A preferred kit will contain one or more vials of pre-aliquoted, "sterilized" (see 
below) DQalpha PCR amplification mixes, typically in concentrated (2X is preferred) form 
and pre-aliquoted in 50 ^1 aliquots. Each 50 nl aliquot will contain: 5 pmol KC1, 1 UJnol 

5 Tris-HCl (pH = 8.3), 250 nmol MgCl 2 , 15 pmol of biotinylated RS 134, 15 pmol of 

biotinylated RS134, 18.75 nmol each of dGTP, dATP, dTTP, dCTP, and from 2.5 units 
up to 50 units of recombinant Taq polymerase (PECI). The dNTPs will be prepared from 
stock solutions at pH = 7. The sterilization protocol also introduces very low levels of 
inactivated DNAse and NaCl, as noted below. 

l o The "sterilized" reagents referred to above relate to the need to avoid contamination 

of reagents with non-sample-derived nucleic acid sequences. Because PCR is such a 
powerful amplification method, contaminating molecules can lead to error. To avoid this 
contamination problem, the present invention provides a novel sterilization procedure. This 
procedure employs a DNAse, preferably bovine pancreas DNAse, to remove low levels of 

15 DNA contamination from batches of PCR reaction mix. Because DN A primers are 

sensitive to this enzyme, the primers are omitted from the batch until the DNAse has been 
inactivated by thermal denaturation. However, if RNA primers are to be employed in the 
PCR mixture, the primers can be present during sterilization. In addition, derivatized 
nucleotides can be used to make an oligonucleotide resistant to DNAse; for instance, thio- 

20 substituted nucleotides, such as phosphorothioates can be used to prepare oligonucleotides 
resistant to DNAse (see Sitzer and Eckstein, 1988, Nuc. Acids Res. 1^:1 1,691). Those of 
skill in the art recognize that an equivalent sterilization procedure utilizes a restriction 
enzyme that cleaves a sequence present in the amplification target; if any contaminating 
target is present, the restriction enzyme will cleave the contaminant, rendering it unavailable 

25 for amplification. 

In the preferred sterilization procedure, however, 2.5 ml of 10X Taq buffer (100 
mM Tris-HCl, pH = 8.3; 500 mM KC1; and 25 mM MgCl 2 ) are autoclaved and added to 
0. 19 ml of a solution that is 25 mM in each dNTP, 0.13 ml of Taq DNA polymerase at a 
concentration of 5 U/jJ.1, and 8.75 ml of glass distilled water. The mixing of these reagents 

30 can be conveniently carried out in a 50 ml polypropylene tube. Once the mixture is 

prepared, 650 U of DNAse I (Cooper Biomedicals; 2500 U/ml in 150 mM NaCl, stored 
frozen) are added and the resulting solution incubated at 37 degrees C for 15 minutes. The 
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DNAse is inactivated by incubating the mixture at 93 degrees C for 10 minutes. Then, 
0.38 ml of each primer at 10 |iM is added to the sterilized reagent, which is then aliquoted, 
preferably with a "dedicated" pipettor and in the protected confines of a laminar flow hood. 

The kit can optionally contain the PGR reagents above, but must contain the 
immobilized probes of the invention, which can be prepared as described above with a 
blotting and automated pipetting device. The solid support can be conveniently marked by 
silk-screening. The kit can also contain SA-HRP at a concentration of 20 ng/ml HRP, 
which correlates to 250 nmol/ml SA. The SA-HRP is supplied in a buffer composed of 10 
mM ACES, 2 M NaCl, at a pH = 6.5. The kit can also contain a concentrated (5X to 20X) 
solution of chromogen, such as leuco dye (as is marketed by Kodak in the Sure-CellTM 
diagnostic kits) or TMB. 

The kit will also be more successful if simple, easy-to-follow instructions are 
included. Typical instructions for a preferred embodiment of the present detection method 
are as follows. About 2 (if hybridization is carried out in a sealed bag) to 3 (if 
hybridization is carried out in a trough) ml of hybridization solution (5X SSPE, 0.5% 
SDS, and, in some instances, 1% dextran-sulfate (M.W. 500,000, although other M.W. 
forms would work) aids in color retention) are pre-warmed to 55 degrees C prior to use. 
The sample DNA is amplified by PCR using biotinylated primers, and the biotinylated 
product is heated to 95 degrees for 3 to 5 minutes to denature the DNA. Denaturation can 
also be accomplished by adding 5 |il of 5 M NaOH to 100 ^1 of PCR product (final NaOH 
concentration is 250 mM). About 15 \i\ of SA-HRP stock (20 ^ig/ml, stored at 4 degrees 
C and never frozen) are then added to the 2 to 3 ml of hybridization solution, and then, 20 
jjj of the still hot, denatured PCR product are added to the mixture. If alkali denaturation is 
used, then one needs to use more PCR product to maintin the same level of sensitivity 
attained with heat denaturation. Typically 25 to 50 ui of PCR product are used with 20 to 
40 fil of the SA-HRP stock solution. Best results are obtained when the strepavidin and 
the biotin are in approximate molar equivalency, i.e., about 300 ng of SA-HRP (measured 
in HRP) are used for every 6 pmol of biotinylated PCR product used for hybridization. 
The PCR product should always be added last and immediately after denaturation. 

If the hybridization is carried out in a sealed bag, all air bubbles should be removed 
prior to sealing the bag. If the hybridization is carried out in a trough, the entire trough 
should be firmly covered with a glass plate. Hybridization is carried out for 20 minutes at 
55 degrees C in a shaking water bath set at a moderate to high shaking speed, i.e., 50 to 
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200 rpm. The wash solution (2X SSPE, 0.1% SDS) is pre-warmed to 55 degrees during 
the hybridization step. After hybridization, all filters are placed in a bowl containing 200 to 
300 ml of pre-warmed wash solution and washed for 8 to 10 minutes in a shaking water 

bath at 55 degrees C. 

Color development is accomplished at room temperature and usually in a shaking 
water bath as follows if the chromogen is TMB. The filters are rinsed in 200 to 300 ml of 
room temperature wash solution for 5 minutes, then transferred to 200 to 300 ml of Buffer 
C (100 mM NaCitrate, pH = 5.0) and rinsed for 5 minutes, then incubated for 5 minutes in 
40 ml of Buffer C containing 2 ml of TMB (2 mg/ml in 100% ethanol and stored at 4 
degrees C), then transferred to a fresh dye solution (composed of 40 ml of Buffer C and 2 
ml of TMB) containing 4 yd of 30% hydrogen peroxide, and the color is allowed to develop 
for 5 to 15 minutes. The color development is stopped by rinsing the filters twice with 
water, the filters can be dried and stored if protected from light. If the typing is weak (faint 
dots), the procedure is repeated using 50 ul of the PCR product and 40 U-l of the SA-HRP 
during the hybridization step. If leuco dye is used in place of TMB, then one replaces the 
Buffer C rinse with a rinse in 200 to 300 ml of IX PBS, after which the filters are placed in 
25 ml of a mixture of the dye and hydrogen peroxide (the same formulation as in Kodak 
Sure-Cell™ kits). The development time is 5 to 10 minutes; color development is stopped 
by washing the filters twice in PBS. 

Example 5 
TVrecrion of Beta-thalassemia Mutations 

Although there are over 54 characterized mutations of the beta-globin gene that can 
give rise to beta-thalassemia, each ethnic group in which this disease is prevalent has a 
limited number of common mutations (see Kazazian et ai, 1984, Nature 110:152-154; 
Kazazian fit aL, 1984, EMBO J. 2:593-596; and Zhang et ah, 1988, Hum. Qenet. 78:37- 
40). In Mediterranean populations, eight mutations are responsible for over 90% of the 
beta-thalassemia alleles. 

Probes were synthesized that are specific for each of these eight mutations as well 
as their corresponding normal sequences. The probes were given 400 nt poly-dT tails with 
terminal transferase and applied to membranes. Various amounts of each probe were 
applied to twelve duplicate nylon filters, irradiated at 40 mJ/cm2, hybridized with amplified 
beta-globin sequences in genomic DNA samples, and color developed. The result is 
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shown in Figure 6. In the figure, the beta-thalassemia locus that is detected by each 
immobilized probe pair is written at the top of the filters. For each filter, the upper row 
contains the probes that are specific for the normal sequence, and the lower row contains 
the probes specific for the mutant sequences. The beta-globin genotype of each sample is 
noted at the right of the corresponding filter. The name, amount applied to the membrane 
5 (in pmols and noted parenthetically), specificity, and sequence of each probe is shown 



below. 






Probe 


Allele Specificity 


Sequence 


RS187 (8) 


Normal Betai-no 


5-TAGACCAATAGGCAGAGAG 


RSI 88 (8) 


Mutant Betai-no 


5 '-CTCTCTG CCTATTAGTCTA 


RS87 (4) 


Normal Beta39 


S'-CCTTGGACCCAGAGGTTCT 


RS89 (4) 


Mutant Beta39 


5-AGAACCTCTAGGTCCAAGG 


RS189 (0.33) 


Normal Betai-6 


5'-CTTGATACCAACCTGCCA 


RS190 (0.33) 


Mutant Betai-6 


S'-TGGGCAGGTTGGCATCAAG 


RS191 (1) 


Mutant Betai-i 


5-TGGGCAGATTGGTATCAAG 


RS192 (4) 


Normal Beta2-i 


5-CCATAGACTCACCCTGAAG 


RS193 (4) 


Mutant Beta2-i 


5'-CTTCAGGATGAGTCTATGG 


RS201 (2) 


Normal Beta2-745 


5'-GCAGAATGGTAGCTGGATT 


RS202 (2) 


Mutant Beta2-745 


S'-GCAGAATGGTACCTGGATT 


RS196(4) 


Normal Beta 6 -8 


5'-ACTCCTGAGGAGAAGTCTG 


RSI 97 (4) 


Mutant Betas 


5-GACTCCTGGGAGAAGTCTG 


RS198 (4) 


Mutant Betas 


5'-TGACTCCTGAGGAGGTCTG 



Because the beta-thalassemia mutations are distributed throughout the beta-globin 
gene, biotinylated PCR primers that amplify the entire gene in a single 1780 bp amplified 
product were used. The primers used for the amplification are shown below. 

25 RS151 is S'-ATCACTTAGACCTCACCCTG 

RS 152 is S'-GACCTCCCACATTCCCrriT 
This amplification product encompasses all known beta-thalassemia mutations. Following 
hybridization and color development, the beta-globin genotypes could be determined by 
noting the pattern of hybridization, as shown in Figure 6. 

30 Unlike the DQalpha typing system, two probes are needed to analyze each mutation - 7 

one specific for the normal sequence and one'specific for the mutant sequence — to 
differentiate normal/mutant heterozygous carriers from mutant/mutant homozygotes. A 
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complicating factor in this analysis is caused by apparent secondary structure in various 
portions of the relatively long beta-globin amplification product that interferes with probe 
hybridization. The relatively high stringency needed to minimize this secondary structure 
requires the use of longer (19 nt hybridizing regions) probes to capture the amplified beta- 
globin fragment. Because this constraint would not permit varying the length of the probes 
to compensate for different hybridization efficiencies, the balancing of signal intensities 
was accomplished by adjusting the amount of each oligonucleotide applied to the 
membrane. 
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We Claim : 

1 . As assay regent comprising an oligonucleotide probe immobilized on a solid 
support, said probe comprising a hybridizing region that is a nucleotide sequence 
complementary to a specific nucleotide sequence to be detected covalently attached to a 

5 spacer arm which is longer than said hybridizing region, wherein said probe is immobilized 
on said solid support by a covalent bond between said solid support and said spacer arm, 
such that the hybridizing region of said probe can hybridize to said nucleotide sequence to 
be detected under hybridizing conditions* 

2 . The assay reagent of Claim 1 , wherein said spacer arm is a polynucleotide 
10 tail, said solid support contains primary or secondary amines prior to attachment of said 

probe to said support, and said attachment is formed by ultraviolet light irradiation of said 
probe on said support. 

3 . The assay reagent of Claim 2, wherein said probe is an 
15 oligodeoxyribonucleotide. 

4. The assay reagent of Claim 3, wherein said tail contains from 200 to 800 
nucleotides. 

5 . The assay reagent of Claim 4, wherein said support comprises nylon. 

6. The assay reagent of Claim 4, wherein said tail comprises at least 150 
20 pyrimidine nucleotides. 

7 . The assay reagent of Claim 6, wherein said pyrimidine nucleotides are 
thymidine nucleotides. 



8 . The assay reagent of Claim 6, wherein said hybridizing region is a sequence 
of nucleotides from 17 to 23 nucleotides in length. 
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9. The assay reagent of Claim 1 that comprises a set of probes immobilized on 
a solid support, wherein said set of probes comprises two or more members, each member 
of said set having a hybridization region different from every other member of said set, 
wherein each member is immobilized on said solid support at a discrete location separate 
from every other probe of said set. 

1 0. The assay reagent of Claim 9, wherein each probe of said set of probes is an 
oligodeoxyribonucleotide. 

1 1 . The assay reagent of Claim 9, wherein one member probe of said set serves 
as a positive control. 

12. The assay reagent of Claim 9, wherein said member probes of said set are 
complementary to nucleic acid sequences of microorganisms. 

1 3 . The assay reagent of Claim 9, wherein said member probes of said set are 
complementary to variant alleles of a genetic locus. 

14. The assay reagent of Claim 13, wherein said generic locus is an HLA locus. 

15. The assay reagent of Claim 14, wherein said HLA locus is DQalpha. 

1 6. The assay reagent of Claim 9, wherein said spacer arm is a polynucleotide 
tail, and said solid support contains primary or secondary amines prior to attachment of 
said probe to said support, and said attachment is formed by ultraviolet light irradiation of 
said probe on said support. 

17. A method for preparing the assay reagent of Claim 2, which method 
comprises: (a) contacting said probe with a solid support comprising amine groups; and 
(b) irradiating the support prepared in step (a) with ultraviolet light. 
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18. A method for detecting the presence of a nucleic acid sequence in a sample, 
which method comprises: (a) contacting said sample with the assay reagent of Claim 1 
under conditions that allow for hybridization of complementary nucleic acid sequences; and 
(b) determining if hybridization has occurred. 

5 19. A kit comprising the assay reagent of Claim 1 and instructions for detecting 

specific nucleic acids with said reagent. 

20. An assay reagent comprising a set of oligonucleotide probes covalently 
attached to a solid support, wherein said set of probes comprises two or more members, 
each member of said set having a hybridization region different from every other member 

10 of said set, wherein each member is immobilized on said support at a discrete location 
separate from every other probe of said set. 

2 1 . The reagent of Claim 20, also comprising a labeled polynucleotide 
hybridized to one of said probes, wherein said polynucleotide comprises at least 50 
nucleotides. 

15 22. The reagent of Claim 20, also comprising a target sequence from a sample 

hybridized to an immobilized probe of said set and a colored or fluorescent compound 
immobilized on said support at the location of said hybridized probe. 
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